Comments on draft-ietf-idnabis-defs-10
Vint Cerf
vint at google.com
Mon Aug 31 07:33:19 CEST 2009
I went to:
http://www.charset.org/punycode.php
and tried using it for conversions:
on input of UNICODE characters, regardless of casing, the output was
uniformly lower case.
however, the following test illustrates Wil's assertion and Paul's
confirmation:
punycode(Väter) = punycode(väter)=xn--vter-loa
punycode(VÄter) = xn--vter-loa
inverse-punycode(xn--vter-loa)= väter
but inverse-punycode(xn-Vter-loa) = Väter
It would appear that the only way for us to be sure of getting the
same thing back from punycoding of a unicode string that has upper
case (ASCII?) characters in it is to require that any undecorated
ASCII characters in the unicode string be lowercased prior to lookup
and registration.
suggestions for clarifying these effects and reasonable responses
would be helpful.
vint
On Aug 30, 2009, at 5:37 PM, Paul Hoffman wrote:
> Wil is correct, and this is clearly called out in RFC 3492:
> Punycode can also support an additional feature that is not used by
> the ToASCII and ToUnicode operations of [IDNA]. When extended
> strings are case-folded prior to encoding, the basic string can use
> mixed case to tell how to convert the folded string into a mixed-
> case
> string. See appendix A "Mixed-case annotation".
> We hated it at the time, but Adam insisted on leaving it in.
>
> If we want the output of Punycode to be all lower-case, we have to
> covert it after the conversion ourselves.
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
More information about the Idna-update
mailing list