Punycode Mixed-case annotation

Marie-France Berny mfberny at gmail.com
Mon Jun 29 00:34:47 CEST 2009


Dear William Tan,
I am afraid I am quite confused by this burst of technical ping pong with
Vint who wants mapping at protocol level.

I just want to know how:

- ecole.fra
- école.fra
- Ecole.fra

These are three French orthotypographies of three different semantics which
may relate to three different IP addresses. How do you propose to support
them ?
Thank you.

Marie-France Berny

2009/6/28 Wil Tan <dready at gmail.com>

> I do understand and agree with the design constraints within which we
> are working.
>
> Your proposal to case fold the XN-label prior to lookup works. The
> only side-effect I perceive is that XN-labels that are not
> all-lowercase may not qualify as A-labels since it doesn't produce
> valid U-label.
>
> My proposal is to case fold only the ASCII codepoints in the Unicode
> string obtained from Punycode decoding of the XN-label, prior to
> checking the validity of the characters. I'm not aware of any
> side-effects of ASCII lowercasing, but do appreciate that the protocol
> steps must be very carefully considered.
>
> I'm hoping someone would jump in here too.
>
> =wil
>
> On Mon, Jun 29, 2009 at 1:34 AM, Vint Cerf<vint at google.com> wrote:
> > Casefold has broad effect as I understand it, beyond lower casing and
> this may have side effects that should be considered before coming to that
> general conclusion. I think one objective in this mapping aspect on lookup
> only is to preserve the case insensitivity that has been related to dns
> lookups. That was accomplished by the matching algoritm in the name servers.
> Since we seek a solution that is client side only to avoid any need to
> modify servers, we have to accomplish an approximation at the lookup client
> sidem at the sme time we want to assure that the 1:1 conversion property of
> A-label and U-label is preserved. Sorry of I am being redundant here. Just
> trying to keep straight the constraints within which we are looking to
> define a lookup only mapping function.
> >
> > ----- Original Message -----
> > From: Wil Tan <dready at gmail.com>
> > To: Vint Cerf
> > Cc: idna-update at alvestrand.no <idna-update at alvestrand.no>
> > Sent: Sun Jun 28 08:21:51 2009
> > Subject: Re: Punycode Mixed-case annotation
> >
> > Yes. Punycode will encode "foobäRr" into "foobRr-eua". Simon
> > Josefsson's tool comes in handy:
> >
> > <
> http://josefsson.org/idn.php?data=foobäRr&profile=Nameprep&mode=punyencode&charset=UTF-8&lastcharset=UTF-8<http://josefsson.org/idn.php?data=foob%C3%A4Rr&profile=Nameprep&mode=punyencode&charset=UTF-8&lastcharset=UTF-8>
> >
> >
> > It is a lossless algorithm so decoding back to Unicode will give you
> > the exact original.
> >
> > As an alternative to lowercasing the XN-label before lookup, perhaps
> > we can specify an additional step to casefold any ASCII code points in
> > the punycode decoding process in section 5.4 "A-label Input" of
> > idnabis-protocol?
> >
> > =wil
> >
> >
> > On Mon, Jun 29, 2009 at 1:05 AM, Vint Cerf<vint at google.com> wrote:
> >> So, absent nameprep we would see upper and lowercase output from
> punycode? and what about conversion back to unicode form?
> >>
> >> ----- Original Message -----
> >> From: Wil Tan <dready at gmail.com>
> >> To: Vint Cerf
> >> Cc: idna-update at alvestrand.no <idna-update at alvestrand.no>
> >> Sent: Sun Jun 28 07:10:29 2009
> >> Subject: Re: Punycode Mixed-case annotation
> >>
> >> The algorithm treats them differently. Basic (ASCII) code points are
> >> copied verbatim to the output. We only see lowercase output because
> >> nameprep does the casefolding so in IDNA2003 only lowercase characters
> >> go in as input to the punycode encoding process.
> >>
> >> =wil
> >>
> >>
> >> On Sun, Jun 28, 2009 at 11:47 PM, Vint Cerf<vint at google.com> wrote:
> >>> Well this is tricky especially if we adopt a practice, for look up, of
> >>> mapping.
> >>>
> >>> I think we want to preserve the definitional idea that punycode A form
> and
> >>> Unicode U form must be convertible.
> >>> My understanding is that the punycode algorithm treats upper and lower
> case
> >>> ASCII letters as equivalent
> >>> for purposes of conversion (they have the same values in the
> algorithm).
> >>>
> >>> I hope someone with more facility with the coding algorithms will jump
> in at
> >>> this point.
> >>>
> >>> vint
> >>>
> >>>
> >>> On Jun 28, 2009, at 9:13 AM, Wil Tan wrote:
> >>>
> >>>> Yes, that would work. Should we also discourage the use of such
> >>>> labels, and explicitly say that XN-labels containing uppercase
> >>>> characters are not A-labels?
> >>>>
> >>>> =wil
> >>>>
> >>>> On Sun, Jun 28, 2009 at 9:26 PM, Vint Cerf<vint at google.com> wrote:
> >>>>>
> >>>>> Wil,
> >>>>>
> >>>>> If we adopt a policy of mapping prior to look up, and if we map upper
> >>>>> case
> >>>>> to lower case,
> >>>>> it may be that xn--RSUM-bpad.com will be changed to xn-rsum-bpad.com
> >>>>> prior
> >>>>> to lookup and it will work.
> >>>>>
> >>>>> vint
> >>>>>
> >>>>>
> >>>>> On Jun 28, 2009, at 7:20 AM, Wil Tan wrote:
> >>>>>
> >>>>>> Hi folks,
> >>>>>>
> >>>>>> RFC3492 contained a mixed-case annotation feature which, though not
> >>>>>> used in IDNA2003, may affect the IDNA2008 specs. In particular,
> basic
> >>>>>> code points ([a-z]) that are left unencoded in punycode may be
> >>>>>> substituted in upper case, and the result of ToUnicode operation
> will
> >>>>>> preserve them. For example,
> >>>>>>
> >>>>>>  ToUnicode("xn--RSUM-bpad.com") = "RéSUMé.com"
> >>>>>>
> >>>>>> From reading the rationale and protocol drafts, I'm not entirely
> sure
> >>>>>> if the input is considered an A-label. The output is certainly not a
> >>>>>> U-label since "RSUM" are disallowed codepoints.
> >>>>>>
> >>>>>> I don't know if this is a problem, but it may warrant at least some
> >>>>>> discussion in section 5.4 of idnabis-protocol?
> >>>>>>
> >>>>>> =wil
> >>>>>> _______________________________________________
> >>>>>> Idna-update mailing list
> >>>>>> Idna-update at alvestrand.no
> >>>>>> http://www.alvestrand.no/mailman/listinfo/idna-update
> >>>>>
> >>>>>
> >>>
> >>>
> >>
> >
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090629/e5f08dbe/attachment-0001.htm 


More information about the Idna-update mailing list