Moving Right Along on the Inclusions Table...
Kenneth Whistler
kenw at sybase.com
Thu Dec 21 20:12:32 CET 2006
Gerv asked:
> Michael Everson wrote:
> > Ethiopic word space, please. It is used as we use hyphens, and the use
> > of hyphen for that purpose is unknown to them.
>
> How many other languages have a "hyphen-like" character which we will
> then need to include?
You get a partial indication of the scope of the problem by
checking the gc=Pd (Punctuation, dash) characters in Unicode,
which would also include the "hyphen-like" characters that
aren't script-specific.
But the additional script-specific ones, besides the Hebrew MAQAF,
include:
058A;ARMENIAN HYPHEN;Pd;0;ON;;;;;N;;;;;
1806;MONGOLIAN TODO SOFT HYPHEN;Pd;0;ON;;;;;N;;;;;
30A0;KATAKANA-HIRAGANA DOUBLE HYPHEN;Pd;0;ON;;;;;N;;;;;
and you'd have to argue for or against inclusion of
30FB;KATAKANA MIDDLE DOT;Po;0;ON;;;;;N;;;;;
which also functions like a write-system-specific word joiner.
I think this just opens up an expanding can of worms. We
shouldn't go there -- it is far better to simply limit the
exceptions here to the one legacy character "-" that we
cannot remove, but not start down the road of claiming
that all other script analogs to hyphens have to be allowed
for. This is simply not a matter that makes or breaks
internet identifiers, and introduces more potential problems
than it is worth.
--Ken
More information about the Idna-update
mailing list