Unregistered code points and new prefixes (was: Re: sharp s (Eszett))

Mark Davis mark.davis at icu-project.org
Sat Mar 8 00:39:10 CET 2008

> My conclusions are:
> (1) Looking up unregistered code points is untenable because it
> makes moving to future versions of Unicode impossible.  That
> conclusion is already reflected in IDNA200X, but IDNA2003
> requires such lookups.

I disagree. While I'm willing to live with the John, Harald, and Patrik's
decision to disallow the resolution with unassigned characters -- just so we
can get this thing out the door -- we should not be basing any
*other*decisions on thinking that it is "untenable".

Consider an character X that was unassigned in Unicode 5.1, but assigned in
Unicode 6.0, and see what happens. Let's suppose that a U5.1 client sends
out "aXc.com" ("a" and "c" are some particular strings, not the literal
U+0061 and U+0063). Before the registry upgrades to U6.0, it will fail, as
expected -- it wasn't (and couldn't have been) registered.

So let's look at the case where the registry has upgraded to U6.0. There are
a small number of cases, and I don't see that *any* of them cause a problem.


   1. X is illegal according to IDNA200X rules under U6.0. The registry
   can't register it, so it won't work. *Not a problem.*
   2. X is legal and unaffected by normalization*. This is true of the
   vast majority of characters. *Then if the registry adds "aXc.com",
   then the old client will work, as expected. *Not a problem -- in fact,
   a positive benefit.*
   3. X is legal but affected by normalization -- but not in the context
   of "a...c". *This is true of the vast majority of those few characters
   remaining from case #2.* Then if the registry adds "aXc.com", then the
   old resolver will work. *Not a problem -- in fact, a positive benefit.
   4. X is legal, and affected by normalization, in the context of
   "a...c". For example, suppose that string a ends with a non-spacing mark
   that reorders with X in NFC. In that case, "aXc.com" would not be
   legal, and could not be registered. *So even in this rare case, not a

John, if you think this situation is untenable, which of the above cases
causes a problem, and exactly what would that problem be?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080307/49d1049f/attachment.html

More information about the Idna-update mailing list