Unregistered code points and new prefixes (was: Re: sharp s
mark.davis at icu-project.org
Sat Mar 8 00:39:10 CET 2008
> My conclusions are:
> (1) Looking up unregistered code points is untenable because it
> makes moving to future versions of Unicode impossible. That
> conclusion is already reflected in IDNA200X, but IDNA2003
> requires such lookups.
I disagree. While I'm willing to live with the John, Harald, and Patrik's
decision to disallow the resolution with unassigned characters -- just so we
can get this thing out the door -- we should not be basing any
*other*decisions on thinking that it is "untenable".
Consider an character X that was unassigned in Unicode 5.1, but assigned in
Unicode 6.0, and see what happens. Let's suppose that a U5.1 client sends
out "aXc.com" ("a" and "c" are some particular strings, not the literal
U+0061 and U+0063). Before the registry upgrades to U6.0, it will fail, as
expected -- it wasn't (and couldn't have been) registered.
So let's look at the case where the registry has upgraded to U6.0. There are
a small number of cases, and I don't see that *any* of them cause a problem.
1. X is illegal according to IDNA200X rules under U6.0. The registry
can't register it, so it won't work. *Not a problem.*
2. X is legal and unaffected by normalization*. This is true of the
vast majority of characters. *Then if the registry adds "aXc.com",
then the old client will work, as expected. *Not a problem -- in fact,
a positive benefit.*
3. X is legal but affected by normalization -- but not in the context
of "a...c". *This is true of the vast majority of those few characters
remaining from case #2.* Then if the registry adds "aXc.com", then the
old resolver will work. *Not a problem -- in fact, a positive benefit.
4. X is legal, and affected by normalization, in the context of
"a...c". For example, suppose that string a ends with a non-spacing mark
that reorders with X in NFC. In that case, "aXc.com" would not be
legal, and could not be registered. *So even in this rare case, not a
John, if you think this situation is untenable, which of the above cases
causes a problem, and exactly what would that problem be?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update