Punycode Mixed-case annotation
    Vint Cerf 
    vint at google.com
       
    Sun Jun 28 17:34:43 CEST 2009
    
    
  
Casefold has broad effect as I understand it, beyond lower casing and this may have side effects that should be considered before coming to that general conclusion. I think one objective in this mapping aspect on lookup only is to preserve the case insensitivity that has been related to dns lookups. That was accomplished by the matching algoritm in the name servers. Since we seek a solution that is client side only to avoid any need to modify servers, we have to accomplish an approximation at the lookup client sidem at the sme time we want to assure that the 1:1 conversion property of A-label and U-label is preserved. Sorry of I am being redundant here. Just trying to keep straight the constraints within which we are looking to define a lookup only mapping function. 
----- Original Message -----
From: Wil Tan <dready at gmail.com>
To: Vint Cerf
Cc: idna-update at alvestrand.no <idna-update at alvestrand.no>
Sent: Sun Jun 28 08:21:51 2009
Subject: Re: Punycode Mixed-case annotation
Yes. Punycode will encode "foobäRr" into "foobRr-eua". Simon
Josefsson's tool comes in handy:
<http://josefsson.org/idn.php?data=foobäRr&profile=Nameprep&mode=punyencode&charset=UTF-8&lastcharset=UTF-8>
It is a lossless algorithm so decoding back to Unicode will give you
the exact original.
As an alternative to lowercasing the XN-label before lookup, perhaps
we can specify an additional step to casefold any ASCII code points in
the punycode decoding process in section 5.4 "A-label Input" of
idnabis-protocol?
=wil
On Mon, Jun 29, 2009 at 1:05 AM, Vint Cerf<vint at google.com> wrote:
> So, absent nameprep we would see upper and lowercase output from punycode? and what about conversion back to unicode form?
>
> ----- Original Message -----
> From: Wil Tan <dready at gmail.com>
> To: Vint Cerf
> Cc: idna-update at alvestrand.no <idna-update at alvestrand.no>
> Sent: Sun Jun 28 07:10:29 2009
> Subject: Re: Punycode Mixed-case annotation
>
> The algorithm treats them differently. Basic (ASCII) code points are
> copied verbatim to the output. We only see lowercase output because
> nameprep does the casefolding so in IDNA2003 only lowercase characters
> go in as input to the punycode encoding process.
>
> =wil
>
>
> On Sun, Jun 28, 2009 at 11:47 PM, Vint Cerf<vint at google.com> wrote:
>> Well this is tricky especially if we adopt a practice, for look up, of
>> mapping.
>>
>> I think we want to preserve the definitional idea that punycode A form and
>> Unicode U form must be convertible.
>> My understanding is that the punycode algorithm treats upper and lower case
>> ASCII letters as equivalent
>> for purposes of conversion (they have the same values in the algorithm).
>>
>> I hope someone with more facility with the coding algorithms will jump in at
>> this point.
>>
>> vint
>>
>>
>> On Jun 28, 2009, at 9:13 AM, Wil Tan wrote:
>>
>>> Yes, that would work. Should we also discourage the use of such
>>> labels, and explicitly say that XN-labels containing uppercase
>>> characters are not A-labels?
>>>
>>> =wil
>>>
>>> On Sun, Jun 28, 2009 at 9:26 PM, Vint Cerf<vint at google.com> wrote:
>>>>
>>>> Wil,
>>>>
>>>> If we adopt a policy of mapping prior to look up, and if we map upper
>>>> case
>>>> to lower case,
>>>> it may be that xn--RSUM-bpad.com will be changed to xn-rsum-bpad.com
>>>> prior
>>>> to lookup and it will work.
>>>>
>>>> vint
>>>>
>>>>
>>>> On Jun 28, 2009, at 7:20 AM, Wil Tan wrote:
>>>>
>>>>> Hi folks,
>>>>>
>>>>> RFC3492 contained a mixed-case annotation feature which, though not
>>>>> used in IDNA2003, may affect the IDNA2008 specs. In particular, basic
>>>>> code points ([a-z]) that are left unencoded in punycode may be
>>>>> substituted in upper case, and the result of ToUnicode operation will
>>>>> preserve them. For example,
>>>>>
>>>>>  ToUnicode("xn--RSUM-bpad.com") = "RéSUMé.com"
>>>>>
>>>>> From reading the rationale and protocol drafts, I'm not entirely sure
>>>>> if the input is considered an A-label. The output is certainly not a
>>>>> U-label since "RSUM" are disallowed codepoints.
>>>>>
>>>>> I don't know if this is a problem, but it may warrant at least some
>>>>> discussion in section 5.4 of idnabis-protocol?
>>>>>
>>>>> =wil
>>>>> _______________________________________________
>>>>> Idna-update mailing list
>>>>> Idna-update at alvestrand.no
>>>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>>
>>>>
>>
>>
>
    
    
More information about the Idna-update
mailing list