Follow-up from Tuesday's discussion of digits in the

Shawn Steele Shawn.Steele at microsoft.com
Wed Dec 3 19:44:30 CET 2008


I don't like the proposal because it causes extra effort and confusion and I don't see a real benefit.

> Since the IDNA2008 effort long ago decided to ban symbols, including
> the
> 11 "heart" symbols in Unicode (all of which are class "So"), fairness
> would dictate that we give no special consideration to use of numbers
> as
> symbols outside their linguistic context.

That makes sense, but the intent is to prohibit mixed digits, then all of them should be prohibited from being mixed, not just these 3.  Why allow mixing of the indic digits for example?  And then if we go that far, why allow mixing of scripts at all?

The intent is to prohibit homographs, as is the restriction of symbols, but with all the Unicode characters out there, homographs can't be avoided.  Even within a script, such as rnicrosoft.com, there are confusables.  In CJK its even worse since most fonts have a limited space to render very complex ideographs.  Even if a character isn't a strict homograph, it can still be easily confused with the glyph a reader expects.

So I don't see this proposal as adding security to IDNA2008 as a whole.  It *may* reduce some confusables in some cases, but mixed scripts are already warned about in modern browsers.


- Shawn




More information about the Idna-update mailing list