Punycode Mixed-case annotation

Wil Tan dready at gmail.com
Sun Jun 28 16:10:29 CEST 2009


The algorithm treats them differently. Basic (ASCII) code points are
copied verbatim to the output. We only see lowercase output because
nameprep does the casefolding so in IDNA2003 only lowercase characters
go in as input to the punycode encoding process.

=wil


On Sun, Jun 28, 2009 at 11:47 PM, Vint Cerf<vint at google.com> wrote:
> Well this is tricky especially if we adopt a practice, for look up, of
> mapping.
>
> I think we want to preserve the definitional idea that punycode A form and
> Unicode U form must be convertible.
> My understanding is that the punycode algorithm treats upper and lower case
> ASCII letters as equivalent
> for purposes of conversion (they have the same values in the algorithm).
>
> I hope someone with more facility with the coding algorithms will jump in at
> this point.
>
> vint
>
>
> On Jun 28, 2009, at 9:13 AM, Wil Tan wrote:
>
>> Yes, that would work. Should we also discourage the use of such
>> labels, and explicitly say that XN-labels containing uppercase
>> characters are not A-labels?
>>
>> =wil
>>
>> On Sun, Jun 28, 2009 at 9:26 PM, Vint Cerf<vint at google.com> wrote:
>>>
>>> Wil,
>>>
>>> If we adopt a policy of mapping prior to look up, and if we map upper
>>> case
>>> to lower case,
>>> it may be that xn--RSUM-bpad.com will be changed to xn-rsum-bpad.com
>>> prior
>>> to lookup and it will work.
>>>
>>> vint
>>>
>>>
>>> On Jun 28, 2009, at 7:20 AM, Wil Tan wrote:
>>>
>>>> Hi folks,
>>>>
>>>> RFC3492 contained a mixed-case annotation feature which, though not
>>>> used in IDNA2003, may affect the IDNA2008 specs. In particular, basic
>>>> code points ([a-z]) that are left unencoded in punycode may be
>>>> substituted in upper case, and the result of ToUnicode operation will
>>>> preserve them. For example,
>>>>
>>>>  ToUnicode("xn--RSUM-bpad.com") = "RéSUMé.com"
>>>>
>>>> From reading the rationale and protocol drafts, I'm not entirely sure
>>>> if the input is considered an A-label. The output is certainly not a
>>>> U-label since "RSUM" are disallowed codepoints.
>>>>
>>>> I don't know if this is a problem, but it may warrant at least some
>>>> discussion in section 5.4 of idnabis-protocol?
>>>>
>>>> =wil
>>>> _______________________________________________
>>>> Idna-update mailing list
>>>> Idna-update at alvestrand.no
>>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>
>>>
>
>


More information about the Idna-update mailing list