IAB Statement on Identifiers and Unicode 7.0.0

Michel Suignard michel at suignard.com
Wed Jan 28 19:21:02 CET 2015


>> And Patrick, the IAB letter recommending that U+0626, ARABIC LETTER YEH WITH HAMZA ABOVE not be used in identifiers is tantamount to recommending that U+00F8 ( ΓΈ ) LATIN SMALL LETTER O WITH STROKE not occur in identifiers. Fine for you Swedes, but surely you must have some Danish and Norwegian friends ;-)
>
>As you know Mark, the reason why IAB recommended U+0626 to not be used is because that was added after the combination of the codepoints to the Unicode Standard.
>
>In the case of the o with stroke (and other cases) the decision could be the other way around.

I have some sympathy for the issue at hand with the unfortunately misnamed U+08A1, and related issues shown by the IAB document, but banning a character such as U+0626 which has been in Unicode since version 1.1 (I don't understand what you mean by " was added after the combination of the codepoints to the Unicode Standard" in that context) is totally unrealistic unless you want to severely crippled Arabic representation in any sorts of identifiers.

I still think the analogy with o with stroke is totally relevant. If you had restricted the nonuse recommendation to the 'combining Hamza above' I would have understood, but the list shown in the document is too extreme.

And to answer Vint issue:
<<
Essentially, after normalization, one expects that the two strings are unambiguously equivalent. mapping from normalized unicode to punycode and back should produce the same (character for character) string. The problem that the Hamza discussion illustrates, as I understand it, is that there is no normalization that produces this result if one string uses the combined character and another uses the composed character sequence - no normalization produces an unambiguous result. >>
The fact a character has the same 'name'  as a composed character sequence does not create an equivalence. They may look the same but they are not necessarily equivalent. If IAB and IETF wants to pursue a path to eliminate these pseudo equivalences, they have to build on top of NFC a more restrictive transform. 

Michel


More information about the Idna-update mailing list