looking up domain names with unassigned code points

Erik van der Poel erikv at google.com
Sat May 10 05:27:50 CEST 2008


Hi James,

Thanks for your input.

Just to be clear, there are now more than 2 camps:

"Old" client camp:

Client is permitted to look up domain names with unassigned code
points, whether the labels are already in Punycode or not.

Correct error message camp:

Client should update itself or warn the user when an unassigned code
point is encountered, whether the labels are already in Punycode or
not.

New compromise camp:

Client is permitted to look up any label that is already in Punycode,
even if it has an unassigned code point encoded inside it, but the
client should update itself or warn the user when an unassigned code
point is encountered in a label that is not already in Punycode.

Other camps:

Please propose other camps here.

-------

Note also that I *am* focusing on the bits on the wire, for both
Unicode 5.1 and future versions of Unicode, *and* future versions of
IDNA (I<heart>NY).

Erik

On Fri, May 9, 2008 at 7:32 PM, James Seng <james at seng.sg> wrote:
> There is a general belief (which I think still holds) in IETF that
> deal with interoperability of the bits on the wire.
>
> I rather we remain focus on the bits on the wire, ie how to get
>>unicode 3.1 encoded into dns. Whether a client gives an error or
> warning or whether it requires an update or not to support previously
> unassigned code point, should be a decision that the prerogative of
> the app developer.
>
> You can label me on the "old camp".
>
> -James Seng
>
> On Sat, May 10, 2008 at 12:15 AM, Erik van der Poel <erikv at google.com> wrote:
>> I've changed the Subject. I'm not sure, but you (and Mark) may have
>> misinterpreted my email. In my opinion, it's a good thing that MSIE7
>> refuses to look up Unicode labels with unassigned code points, but
>> it's bad that it also refuses to look up Punycode labels that encode
>> unassigned code points.
>>
>> There seem to be at least 2 camps with regard to the unassigned issue:
>> those that want to allow such lookups, so that "old" clients continue
>> to "work" when newly assigned code points are used, and those that
>> want to nudge developers in the direction of providing the right error
>> message: "You are attempting to access a domain name using a new
>> character that is not supported by this version of the software.
>> Please enable automatic update or go to download.example.com to update
>> manually."
>>
>> In my opinion, the correct error message outweighs the "old" client
>> issue, especially if we allow clients to look up Punycode labels, no
>> matter what they might decode to.
>>
>> I don't think we have consensus on this issue. In the past, Mark, Ken
>> and maybe Martin were in the "old" client camp, while John and I
>> appeared to be in the correct error message camp.
>>
>> Now that I have proposed that clients should be allowed to look up
>> Punycode labels, we may have some individuals move from one camp to
>> the other, or to an entirely new camp?
>>
>> Erik
>>
>> On Thu, May 8, 2008 at 7:18 PM, Shawn Steele <Shawn.Steele at microsoft.com> wrote:
>>>
>>>
>>>
>>>
>>> Agreed pretty much.  Sorry I haven't followed the recommendations for the
>>> casing, etc. rules, however disambiguating the  mapping and
>>> Unicode<->Punycode rules from the disallowed set would "solve" (some of) the
>>> problems that cause IE7 to not look up unassigned characters.
>>>
>>>
>>>
>>> -          Shawn
>>>
>>>
>>>
>>>
>>> From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On
>>> Behalf Of Mark Davis
>>>  Sent: ,  08,  2008 18:30
>>>  To: Erik van der Poel
>>>  Cc: Shawn Steele; idna-update at alvestrand.no
>>>  Subject: Re: Archaic scripts
>>>
>>>
>>>
>>> I agree.
>>>
>>>
>>>
>>>
>>> On Thu, May 8, 2008 at 4:01 PM, Erik van der Poel <erikv at google.com> wrote:
>>>
>>> An unassigned codepoint may be assigned to an uppercase letter. So a
>>>  piece of software that looks up purported U-labels must check whether
>>>  it contains any unassigned codepoints. So we should recommend that
>>>  such software be restricted (follow certain rules), in order to
>>>  achieve interoperability. (MSIE7 refuses to look up domain names
>>>  containing unassigned characters.)
>>>
>>>  If we lock down the DISALLOWED set too tightly, we may regret it
>>>  later. One way to avoid locking it down is to recommend that
>>>  burned-in-ROM and other unupgradable software only use protocols that
>>>  use LDH- and A-labels. All pieces of software, whether IDNA-aware or
>>>  not, are explicitly permitted to look up Punycode labels, without
>>>  decoding them to check for DISALLOWED, CONTEXT*, etc.
>>>
>>>  On the other hand, we should recommend that protocol and application
>>>  developers only use U-labels if they are willing to make their
>>>  software upgradable. They need to do this for unassigned codepoints
>>>  anyway. So we might as well allow for the possibility of moving some
>>>  characters from DISALLOWED to other categories (if and when we
>>>  determine that they should be moved, having come up with better
>>>  criteria for use in IDNs, more information, clamoring users, etc).
>>>
>>>  If we allow for this possibility, we don't need to fret so much about
>>>  historic scripts right now. Just dump them in DISALLOWED for now, and
>>>  deal with them later, if they ever need to be dealt with.
>>>
>>>  Erik
>>>
>>>
>>>
>>>
>>>  On Thu, May 8, 2008 at 1:59 PM, Shawn Steele <Shawn.Steele at microsoft.com>
>>> wrote:
>>>  > Erik wrote:
>>>  >
>>>  >  > This also neatly solves the problem of whether or not IDNA-unaware and
>>>  >  > IDNA-aware clients are allowed to look up labels with Punycode in
>>>  >  > them. They should always be allowed to do so. Only software that tries
>>>  >  > to convert from U-labels to A-labels needs to be restricted. This is
>>>  >  > how we can achieve the most reasonable level of interoperability, in
>>>  >  > my opinion.
>>>  >
>>>  >  I think that conversion U to A conversion does NOT need restriction.
>>> Assuming that the steps in conversion include NFKC or appropriate mappings,
>>> then if a character moves from disallowed to allowed, the conversion is
>>> already known.  So no change is required for lookup, even if conversion is
>>> required. The only change would be the software the decides the legality of
>>> the name, which, IMO could be at a different layer.
>>>  >
>>>  >  - Shawn
>>>  >
>>>  >
>>>  >
>>>  >  _______________________________________________
>>>  >  Idna-update mailing list
>>>  >  Idna-update at alvestrand.no
>>>  >  http://www.alvestrand.no/mailman/listinfo/idna-update
>>>  >
>>>  _______________________________________________
>>>  Idna-update mailing list
>>>  Idna-update at alvestrand.no
>>>  http://www.alvestrand.no/mailman/listinfo/idna-update
>>>
>>>
>>>
>>>
>>>  --
>>>  Mark
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>>
>


More information about the Idna-update mailing list