Comments on draft-ietf-idnabis-defs-10

Mon Aug 31 07:33:19 CEST 2009

I went to:

http://www.charset.org/punycode.php

and tried using it for conversions:

on input of UNICODE characters, regardless of casing, the output was  
uniformly lower case.

however, the following test illustrates Wil's assertion and Paul's  
confirmation:

punycode(Väter) = punycode(väter)=xn--vter-loa

punycode(VÄter) = xn--vter-loa

inverse-punycode(xn--vter-loa)= väter

but inverse-punycode(xn-Vter-loa) = Väter

It would appear that the only way for us to be sure of getting the  
same thing back from punycoding of a unicode string that has upper  
case (ASCII?) characters in it is to require that any undecorated  
ASCII characters in the unicode string be lowercased prior to lookup  
and registration.

suggestions for clarifying these effects and reasonable responses  
would be helpful.

vint

On Aug 30, 2009, at 5:37 PM, Paul Hoffman wrote:

> Wil is correct, and this is clearly called out in RFC 3492:
>   Punycode can also support an additional feature that is not used by
>   the ToASCII and ToUnicode operations of [IDNA].  When extended
>   strings are case-folded prior to encoding, the basic string can use
>   mixed case to tell how to convert the folded string into a mixed- 
> case
>   string.  See appendix A "Mixed-case annotation".
> We hated it at the time, but Adam insisted on leaving it in.
>
> If we want the output of Punycode to be all lower-case, we have to  
> covert it after the conversion ourselves.
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update