mappings-01 and the general procedure

Erik van der Poel erikv at google.com
Mon Jul 13 16:49:51 CEST 2009


Yes, that makes sense.

I am also waiting for Vint to say something about reaching consensus
on whether or not to keep the mappings document.

My own opinion is that there still seems to be a significant
disconnect between those that would prefer to keep the mappings
confined to "user input", and those that would map even strings that
have arrived "over the wire". While I have some sympathy for those
that strive to create "clean" protocols (without leniently mapping
strings that have arrived "over the wire"), I am also well aware of
the history of e.g. browsers, which have traditionally been lenient
and rarely become stricter. In other words, I consider the current
approach of IDNAbis to be "wishful thinking" and not likely to
actually happen.

Of course, I'd love to be wrong about this and see the browser
developers get strict for the currently small percentage of domain
names that are non-ASCII *and* need to be mapped. (That percentage is
small -- most non-ASCII domain names on the Web are already U-labels.)

Currently, you can input a superscript two (U+00B2) in HTML and the
browser will "helpfully" and inexplicably convert that to a normal two
(U+0032) before looking it up in DNS. Why on Earth do we need to be
that lenient?

We shall see what the browser developers decide to do. I will be
watching them carefully and then making my own recommendations within
Google.

Erik

On Sun, Jul 12, 2009 at 5:02 PM, John C Klensin<klensin at jck.com> wrote:
>
>
> --On Sunday, July 12, 2009 16:47 -0700 Erik van der Poel
> <erikv at google.com> wrote:
>
>> I think it would be a good idea to move the normalization and
>> NFC text from section 5.2 of the protocol draft to section 5.3.
>>
>> The order of the steps in the mappings draft should probably be
>> lower-casing, wide/narrow mapping, then NFC. These steps are
>> performed in this order in IDNA2003 too.
>
> I have, temporarily, done that move in a different way, but it
> is clear that, at the end of Section 5.3, the string has to
> [still] be in NFC form.  If and when we decide that we are
> keeping the mapping document, I want to restructure 5.2 and 5.3
> into one section and refer to Mapping for, e.g., the discussion
> about getting to Unicode.  IMO, that discussion in Mapping is
> better (and more comprehensive) that what is now in Section 5.2.
>
>
> So, if we keep Mapping, then the combined 5.2 and 5.3 would
> basically say:
>
>        * Get the string from whatever form you find it in to
>        Unicode, following the advice in [Mapping]
>
>        * That string MUST be in NFC form.
>
> Does that work for everyone, again, given that we don't discard
> the Mapping doc entirely?
>
> If we do drop the Mapping document (I'm waiting for Vint to say
> something about when he thinks consensus has been reached even
> though I thought the San Francisco output was fairly clear), I
> hope to enlist help from Paul and Pete to get the relevant
> explanations out of Mapping and put them in 5.2 and 5.3 of
> Protocol.   Again, does that make sense?
>
>    john
>
>


More information about the Idna-update mailing list