Punycode Mixed-case annotation

Wil Tan dready at gmail.com
Sun Jun 28 17:48:41 CEST 2009


I do understand and agree with the design constraints within which we
are working.

Your proposal to case fold the XN-label prior to lookup works. The
only side-effect I perceive is that XN-labels that are not
all-lowercase may not qualify as A-labels since it doesn't produce
valid U-label.

My proposal is to case fold only the ASCII codepoints in the Unicode
string obtained from Punycode decoding of the XN-label, prior to
checking the validity of the characters. I'm not aware of any
side-effects of ASCII lowercasing, but do appreciate that the protocol
steps must be very carefully considered.

I'm hoping someone would jump in here too.

=wil

On Mon, Jun 29, 2009 at 1:34 AM, Vint Cerf<vint at google.com> wrote:
> Casefold has broad effect as I understand it, beyond lower casing and this may have side effects that should be considered before coming to that general conclusion. I think one objective in this mapping aspect on lookup only is to preserve the case insensitivity that has been related to dns lookups. That was accomplished by the matching algoritm in the name servers. Since we seek a solution that is client side only to avoid any need to modify servers, we have to accomplish an approximation at the lookup client sidem at the sme time we want to assure that the 1:1 conversion property of A-label and U-label is preserved. Sorry of I am being redundant here. Just trying to keep straight the constraints within which we are looking to define a lookup only mapping function.
>
> ----- Original Message -----
> From: Wil Tan <dready at gmail.com>
> To: Vint Cerf
> Cc: idna-update at alvestrand.no <idna-update at alvestrand.no>
> Sent: Sun Jun 28 08:21:51 2009
> Subject: Re: Punycode Mixed-case annotation
>
> Yes. Punycode will encode "foobäRr" into "foobRr-eua". Simon
> Josefsson's tool comes in handy:
>
> <http://josefsson.org/idn.php?data=foobäRr&profile=Nameprep&mode=punyencode&charset=UTF-8&lastcharset=UTF-8>
>
> It is a lossless algorithm so decoding back to Unicode will give you
> the exact original.
>
> As an alternative to lowercasing the XN-label before lookup, perhaps
> we can specify an additional step to casefold any ASCII code points in
> the punycode decoding process in section 5.4 "A-label Input" of
> idnabis-protocol?
>
> =wil
>
>
> On Mon, Jun 29, 2009 at 1:05 AM, Vint Cerf<vint at google.com> wrote:
>> So, absent nameprep we would see upper and lowercase output from punycode? and what about conversion back to unicode form?
>>
>> ----- Original Message -----
>> From: Wil Tan <dready at gmail.com>
>> To: Vint Cerf
>> Cc: idna-update at alvestrand.no <idna-update at alvestrand.no>
>> Sent: Sun Jun 28 07:10:29 2009
>> Subject: Re: Punycode Mixed-case annotation
>>
>> The algorithm treats them differently. Basic (ASCII) code points are
>> copied verbatim to the output. We only see lowercase output because
>> nameprep does the casefolding so in IDNA2003 only lowercase characters
>> go in as input to the punycode encoding process.
>>
>> =wil
>>
>>
>> On Sun, Jun 28, 2009 at 11:47 PM, Vint Cerf<vint at google.com> wrote:
>>> Well this is tricky especially if we adopt a practice, for look up, of
>>> mapping.
>>>
>>> I think we want to preserve the definitional idea that punycode A form and
>>> Unicode U form must be convertible.
>>> My understanding is that the punycode algorithm treats upper and lower case
>>> ASCII letters as equivalent
>>> for purposes of conversion (they have the same values in the algorithm).
>>>
>>> I hope someone with more facility with the coding algorithms will jump in at
>>> this point.
>>>
>>> vint
>>>
>>>
>>> On Jun 28, 2009, at 9:13 AM, Wil Tan wrote:
>>>
>>>> Yes, that would work. Should we also discourage the use of such
>>>> labels, and explicitly say that XN-labels containing uppercase
>>>> characters are not A-labels?
>>>>
>>>> =wil
>>>>
>>>> On Sun, Jun 28, 2009 at 9:26 PM, Vint Cerf<vint at google.com> wrote:
>>>>>
>>>>> Wil,
>>>>>
>>>>> If we adopt a policy of mapping prior to look up, and if we map upper
>>>>> case
>>>>> to lower case,
>>>>> it may be that xn--RSUM-bpad.com will be changed to xn-rsum-bpad.com
>>>>> prior
>>>>> to lookup and it will work.
>>>>>
>>>>> vint
>>>>>
>>>>>
>>>>> On Jun 28, 2009, at 7:20 AM, Wil Tan wrote:
>>>>>
>>>>>> Hi folks,
>>>>>>
>>>>>> RFC3492 contained a mixed-case annotation feature which, though not
>>>>>> used in IDNA2003, may affect the IDNA2008 specs. In particular, basic
>>>>>> code points ([a-z]) that are left unencoded in punycode may be
>>>>>> substituted in upper case, and the result of ToUnicode operation will
>>>>>> preserve them. For example,
>>>>>>
>>>>>>  ToUnicode("xn--RSUM-bpad.com") = "RéSUMé.com"
>>>>>>
>>>>>> From reading the rationale and protocol drafts, I'm not entirely sure
>>>>>> if the input is considered an A-label. The output is certainly not a
>>>>>> U-label since "RSUM" are disallowed codepoints.
>>>>>>
>>>>>> I don't know if this is a problem, but it may warrant at least some
>>>>>> discussion in section 5.4 of idnabis-protocol?
>>>>>>
>>>>>> =wil
>>>>>> _______________________________________________
>>>>>> Idna-update mailing list
>>>>>> Idna-update at alvestrand.no
>>>>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>>>
>>>>>
>>>
>>>
>>
>


More information about the Idna-update mailing list