Unicode 7.0.0, (combining) Hamza Above, and normalization

Shawn Steele Shawn.Steele at microsoft.com
Fri Aug 8 02:20:44 CEST 2014

> Again, they can be "not the same" for Unicode purposes and "the same" for IDNA ones.  

I'm confused about this or how it even matters.  ß as in fußball is clearly the same as ss as in fussball (linguistically), and are also confusable, yet IDNA2008 decided they should be different.  So regardless of the characteristics of any one character clearly IDNA is happy that the other mitigations in place prevent abuse of the character set, in which case second guessing Unicode is just a waste of time.  Certainly IDNA is going to continue to work just fine regardless of the outcome of this discussion.

So then one argues that linguistically it’s not important, they look the same and DNS is for identifiers and was never intended to be linguistic.  Therefore this new character may as well be prohibited since you can make identifiers without it.  However that belies the fact that ß and ss were differentiated in IDNA2008 for purely linguistic/aesthetic reasons as they certainly worked fine as identifiers before then.

There is a fairly simple solution to preventing concerns about Unicode's encoding practices in the future.  If people here are concerned enough about character encodings and Unicode's decisions, then participate in Unicode, please don't try to change it after the fact.


