Moving Right Along on the Inclusions Table...

Kenneth Whistler kenw at sybase.com
Thu Dec 21 20:12:32 CET 2006


Gerv asked:

> Michael Everson wrote:
> > Ethiopic word space, please. It is used as we use hyphens, and the use 
> > of hyphen for that purpose is unknown to them.
> 
> How many other languages have a "hyphen-like" character which we will 
> then need to include?

You get a partial indication of the scope of the problem by
checking the gc=Pd (Punctuation, dash) characters in Unicode,
which would also include the "hyphen-like" characters that
aren't script-specific.

But the additional script-specific ones, besides the Hebrew MAQAF,
include:

058A;ARMENIAN HYPHEN;Pd;0;ON;;;;;N;;;;;
1806;MONGOLIAN TODO SOFT HYPHEN;Pd;0;ON;;;;;N;;;;;
30A0;KATAKANA-HIRAGANA DOUBLE HYPHEN;Pd;0;ON;;;;;N;;;;;

and you'd have to argue for or against inclusion of

30FB;KATAKANA MIDDLE DOT;Po;0;ON;;;;;N;;;;;

which also functions like a write-system-specific word joiner.

I think this just opens up an expanding can of worms. We
shouldn't go there -- it is far better to simply limit the
exceptions here to the one legacy character "-" that we
cannot remove, but not start down the road of claiming
that all other script analogs to hyphens have to be allowed
for. This is simply not a matter that makes or breaks
internet identifiers, and introduces more potential problems
than it is worth.

--Ken



More information about the Idna-update mailing list