Punycode Mixed-case annotation

Wil Tan dready at gmail.com
Sun Jun 28 17:21:51 CEST 2009


Yes. Punycode will encode "foobäRr" into "foobRr-eua". Simon
Josefsson's tool comes in handy:

<http://josefsson.org/idn.php?data=foobäRr&profile=Nameprep&mode=punyencode&charset=UTF-8&lastcharset=UTF-8>

It is a lossless algorithm so decoding back to Unicode will give you
the exact original.

As an alternative to lowercasing the XN-label before lookup, perhaps
we can specify an additional step to casefold any ASCII code points in
the punycode decoding process in section 5.4 "A-label Input" of
idnabis-protocol?

=wil


On Mon, Jun 29, 2009 at 1:05 AM, Vint Cerf<vint at google.com> wrote:
> So, absent nameprep we would see upper and lowercase output from punycode? and what about conversion back to unicode form?
>
> ----- Original Message -----
> From: Wil Tan <dready at gmail.com>
> To: Vint Cerf
> Cc: idna-update at alvestrand.no <idna-update at alvestrand.no>
> Sent: Sun Jun 28 07:10:29 2009
> Subject: Re: Punycode Mixed-case annotation
>
> The algorithm treats them differently. Basic (ASCII) code points are
> copied verbatim to the output. We only see lowercase output because
> nameprep does the casefolding so in IDNA2003 only lowercase characters
> go in as input to the punycode encoding process.
>
> =wil
>
>
> On Sun, Jun 28, 2009 at 11:47 PM, Vint Cerf<vint at google.com> wrote:
>> Well this is tricky especially if we adopt a practice, for look up, of
>> mapping.
>>
>> I think we want to preserve the definitional idea that punycode A form and
>> Unicode U form must be convertible.
>> My understanding is that the punycode algorithm treats upper and lower case
>> ASCII letters as equivalent
>> for purposes of conversion (they have the same values in the algorithm).
>>
>> I hope someone with more facility with the coding algorithms will jump in at
>> this point.
>>
>> vint
>>
>>
>> On Jun 28, 2009, at 9:13 AM, Wil Tan wrote:
>>
>>> Yes, that would work. Should we also discourage the use of such
>>> labels, and explicitly say that XN-labels containing uppercase
>>> characters are not A-labels?
>>>
>>> =wil
>>>
>>> On Sun, Jun 28, 2009 at 9:26 PM, Vint Cerf<vint at google.com> wrote:
>>>>
>>>> Wil,
>>>>
>>>> If we adopt a policy of mapping prior to look up, and if we map upper
>>>> case
>>>> to lower case,
>>>> it may be that xn--RSUM-bpad.com will be changed to xn-rsum-bpad.com
>>>> prior
>>>> to lookup and it will work.
>>>>
>>>> vint
>>>>
>>>>
>>>> On Jun 28, 2009, at 7:20 AM, Wil Tan wrote:
>>>>
>>>>> Hi folks,
>>>>>
>>>>> RFC3492 contained a mixed-case annotation feature which, though not
>>>>> used in IDNA2003, may affect the IDNA2008 specs. In particular, basic
>>>>> code points ([a-z]) that are left unencoded in punycode may be
>>>>> substituted in upper case, and the result of ToUnicode operation will
>>>>> preserve them. For example,
>>>>>
>>>>>  ToUnicode("xn--RSUM-bpad.com") = "RéSUMé.com"
>>>>>
>>>>> From reading the rationale and protocol drafts, I'm not entirely sure
>>>>> if the input is considered an A-label. The output is certainly not a
>>>>> U-label since "RSUM" are disallowed codepoints.
>>>>>
>>>>> I don't know if this is a problem, but it may warrant at least some
>>>>> discussion in section 5.4 of idnabis-protocol?
>>>>>
>>>>> =wil
>>>>> _______________________________________________
>>>>> Idna-update mailing list
>>>>> Idna-update at alvestrand.no
>>>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>>
>>>>
>>
>>
>


More information about the Idna-update mailing list