Unregistered code points and new prefixes (was: Re: sharp s(Eszett))

Martin Duerst duerst at it.aoyama.ac.jp
Sun Mar 9 03:11:38 CET 2008


I very strongly agree with what Mark is saying below.
I wanted to write very much the same thing, but Mark has
done an excellent job, most probably better than what I'd
have done.

I can't see any serious reason why the "client sends unknown"
rule of IDNA2003 should be changed, at all. The benefits
(listed below by Mark) are obvious. Also, as Mark says,
this aspect is independent of other changes that we are
looking at.

Regards,   Martin.

At 08:39 08/03/08, Mark Davis wrote:
>...
>>My conclusions are:
>>
>>(1) Looking up unregistered code points is untenable because it
>>makes moving to future versions of Unicode impossible.  That
>>conclusion is already reflected in IDNA200X, but IDNA2003
>>requires such lookups.
>
>I disagree. While I'm willing to live with the John, Harald, and Patrik's decision to disallow the resolution with unassigned characters -- just so we can get this thing out the door -- we should not be basing any other decisions on thinking that it is "untenable".
>
>Consider an character X that was unassigned in Unicode 5.1, but assigned in Unicode 6.0, and see what happens. Let's suppose that a U5.1 client sends out "aXc.com" ("a" and "c" are some particular strings, not the literal U+0061 and U+0063). Before the registry upgrades to U6.0, it will fail, as expected -- it wasn't (and couldn't have been) registered. 
>
>So let's look at the case where the registry has upgraded to U6.0. There are a small number of cases, and I don't see that *any* of them cause a problem.
>
>Cases: 
>    * X is illegal according to IDNA200X rules under U6.0. The registry can't register it, so it won't work. Not a problem. 
>    * X is legal and unaffected by normalization. This is true of the vast majority of characters. Then if the registry adds "aXc.com", then the old client will work, as expected. Not a problem -- in fact, a positive benefit. 
>    * X is legal but affected by normalization -- but not in the context of "a...c". This is true of the vast majority of those few characters remaining from case #2. Then if the registry adds "aXc.com", then the old resolver will work. Not a problem -- in fact, a positive benefit. 
>    * X is legal, and affected by normalization, in the context of "a...c". For example, suppose that string a ends with a non-spacing mark that reorders with X in NFC. In that case, "aXc.com" would not be legal, and could not be registered. So even in this rare case, not a problem. 
>John, if you think this situation is untenable, which of the above cases causes a problem, and exactly what would that problem be?
>
>Mark
>
>
>_______________________________________________
>Idna-update mailing list
>Idna-update at alvestrand.no
>http://www.alvestrand.no/mailman/listinfo/idna-update


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     



More information about the Idna-update mailing list