Unicode 7.0.0, (combining) Hamza Above, and normalization

Asmus Freytag asmusf at ix.netcom.com
Sat Aug 9 02:49:04 CEST 2014

On 8/8/2014 11:35 AM, Shawn Steele wrote:
>> Computers are dumb, homographs are confusing for all the reasons we know, so our least bad solution is to forbid them even in places where they'd be linguistically harmless.
> How?
> (you confused me with "linguistically harmless".  I read that as not damaging my language, yet forbidding characters is sort of linguistically damaging by definition).

First, To augment that.

The draft not only proposes to outlaw correctly encoded spelling, but 
suggests that incorrectly encoded string should be used as workaround. 
Irrespective of whether users have access to that string on their keyboards.

This is rather different from a fallback spelling, such as using "ss" 
for "ß", or many similar cases. Established fallback spellings typically 
use more basic letters, and often have an established history in the 
user community. They are thus less "linguistically damaging" than the 
case under discussion.

Second, I note that, implicit, in the wording "least bad" is the 
acknowledgement that there is in fact a range of options. I am far from 
convinced that "least bad" is the correct evaluation. Certainly it's the 
"most restrictive" solution, but that alone doesn't make the most 
appropriate one.

Third, the degree of "context" available in a domain label is not fixed 
at zero. That may be true for the root, but not for all other zones. For 
zones where the labels are expected to be in the Fula language, a 
blanket prohibition of the new code point is not necessarily "less bad" 
than allowing a zone specific policy of prohibiting the other 
(look-alike) sequence.

More on that in a separate post.


More information about the Idna-update mailing list