What rules have been used for the current list of codepoints?

Cary Karp ck at nic.museum
Thu Dec 14 12:22:11 CET 2006


Quoting Michael:

> There are a few script-specific non-letter characters that need
> inclusion. The Geresh is needed to indicate certain types of
> abbreviation in the Hebrew language, and is also used to mark certain
> consonants in Ladino. It's not really "optional" therefore. The Ethiopic
> wordspace is used by them in the same way we use a hyphen to separate
> words with a visible mark (as in vint-cerf.com). They do not use the
> hyphen and it is an alien thing to impose on them. On the other hand, it
> can ONLY occur between two Ethiopic letters, so it should not be
> problematic.

And the geresh is only needed between two Hebrew letters. Any attempt to
use it as a surrogate for the typographically similar APOSTROPHE in a
left-to-right string will fail stringprep.

Is there anything to be gained by treating characters that are this
clearly bound in context as belonging to a class of their own?

/Cary


More information about the Idna-update mailing list