Unicode 7.0.0, (combining) Hamza Above, and normalization
ken.whistler at sap.com
Fri Aug 8 00:56:51 CEST 2014
With respect, let me stop you right there.
My argument mentions linguistic grounds, but the linguistic grounds were
originally relevant to the decision process in the UTC a couple years ago
regarding the encoding of U+08A1 beh-with-hamza as a separate character.
The linguistic grounds are now basically irrelevant to the *current*
discussion. My assertion is that U+08A1 beh-with-hamza as *NOT*
the same as the sequence beh + combining Hamza. And that assertion
can be derived from the decisions and the data published by the UTC
about the encoding. I don’t need to know the U+08A1 is used for Fula
or what sound is involved to be absolutely certain about the identity
The same applies to the ghain versus ain + combining dot sequence
I cited. I don’t have to know anything about Arabic to be quite confident
in that claim about *encoding* identity or non-identity, regardless of
whether I “see the same thing” when looking at a printed or screen
rendering of them.
All of this discussion seems to be boiling down to IETF second-guessing
of Unicode character encoding decisions and complaints about Unicode
normalization not satisfying expectations based on rather simplistic
notions of which things that look the same should *be* the same.
In this case, even if there were any marginal improvement to IDNA
that would result from disallowing U+08A1 (which I do not stipulate,
by the way), it is clear that making exceptions in the table derivation
for IDNA because of a one-off quibble about encoding decisions
made by the UTC and normalization just *increases* the overall
complexity and level of confusion about application of the protocol.
your argument seems to be based on linguistic grounds and this I can accept but …
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update