"This case isn't the important one" (was Re: Visually confusable characters (8))

Shawn Steele Shawn.Steele at microsoft.com
Mon Aug 11 21:42:51 CEST 2014

> > A)     From a purely binary standpoint no mapping was added, which is pretty much what normalization guaranteed, that no binary mappings would change.

> "No mappings change" is true because the code point wasn't assigned before.  The stability "promise" was also that if a new precomposed version of something formerly made out of a composition sequence were added, the normalization rules would use the decomposed form.  So, the entire argument turns on …

Experts have stated these aren't the same character.  And the promise was, I think, that the mappings of the actual codepoints wouldn't change, except perhaps, for new characters (where both are added at the same time).

> > B)      Linguistic experts have indicated that, despite the confusing name, this is not the same character.

> Note that I'm not saying Unicode made the wrong decision.  I'm not qualified to have an opinion about that.  I'm saying instead that, for the purposes of IDNA, this decision appears to cause us trouble.

I really don't get what the trouble is?  Intra-script vs Inter-script homographs?  I fail to understand why a difference between Inter and Intra would matter?  Not to mention:  A) someone more knowledgable than I could probably find other Intra-script cases, and B) 'looks-similar' is a homograph I think.  I'm not sure how one can differentiate between 'looks-similar-to' and 'looks-the-same'.  That depends on font behavior and numerous other cases.  Certainly there are fonts where 'l' and '1' look identical.  (sure, maybe it isn't common nowadays, but on my old typewriter they were obviously identical for obvious reasons, and that's a 'font', though I'm not sure I could find a computerized version of it).

I think this entire subject isn't a boolean "these characters are always going to look the same and confuse people" or "these characters never look the same and never confuse people".  In the middle there's a big gray area that depends on a number of things, including probably the fonts, font size, whether it's a native script for the reader, and probably the reader's sobriety or whether they had a good night's sleep as well.  And I don't really think that inter/intra makes a huge difference, it's just part of the gray area.


More information about the Idna-update mailing list