looking up domain names with unassigned code points

James Seng james at seng.sg
Sat May 10 04:32:50 CEST 2008


There is a general belief (which I think still holds) in IETF that
deal with interoperability of the bits on the wire.

I rather we remain focus on the bits on the wire, ie how to get
>unicode 3.1 encoded into dns. Whether a client gives an error or
warning or whether it requires an update or not to support previously
unassigned code point, should be a decision that the prerogative of
the app developer.

You can label me on the "old camp".

-James Seng

On Sat, May 10, 2008 at 12:15 AM, Erik van der Poel <erikv at google.com> wrote:
> I've changed the Subject. I'm not sure, but you (and Mark) may have
> misinterpreted my email. In my opinion, it's a good thing that MSIE7
> refuses to look up Unicode labels with unassigned code points, but
> it's bad that it also refuses to look up Punycode labels that encode
> unassigned code points.
>
> There seem to be at least 2 camps with regard to the unassigned issue:
> those that want to allow such lookups, so that "old" clients continue
> to "work" when newly assigned code points are used, and those that
> want to nudge developers in the direction of providing the right error
> message: "You are attempting to access a domain name using a new
> character that is not supported by this version of the software.
> Please enable automatic update or go to download.example.com to update
> manually."
>
> In my opinion, the correct error message outweighs the "old" client
> issue, especially if we allow clients to look up Punycode labels, no
> matter what they might decode to.
>
> I don't think we have consensus on this issue. In the past, Mark, Ken
> and maybe Martin were in the "old" client camp, while John and I
> appeared to be in the correct error message camp.
>
> Now that I have proposed that clients should be allowed to look up
> Punycode labels, we may have some individuals move from one camp to
> the other, or to an entirely new camp?
>
> Erik
>
> On Thu, May 8, 2008 at 7:18 PM, Shawn Steele <Shawn.Steele at microsoft.com> wrote:
>>
>>
>>
>>
>> Agreed pretty much.  Sorry I haven't followed the recommendations for the
>> casing, etc. rules, however disambiguating the  mapping and
>> Unicode<->Punycode rules from the disallowed set would "solve" (some of) the
>> problems that cause IE7 to not look up unassigned characters.
>>
>>
>>
>> -          Shawn
>>
>>
>>
>>
>> From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On
>> Behalf Of Mark Davis
>>  Sent: ,  08,  2008 18:30
>>  To: Erik van der Poel
>>  Cc: Shawn Steele; idna-update at alvestrand.no
>>  Subject: Re: Archaic scripts
>>
>>
>>
>> I agree.
>>
>>
>>
>>
>> On Thu, May 8, 2008 at 4:01 PM, Erik van der Poel <erikv at google.com> wrote:
>>
>> An unassigned codepoint may be assigned to an uppercase letter. So a
>>  piece of software that looks up purported U-labels must check whether
>>  it contains any unassigned codepoints. So we should recommend that
>>  such software be restricted (follow certain rules), in order to
>>  achieve interoperability. (MSIE7 refuses to look up domain names
>>  containing unassigned characters.)
>>
>>  If we lock down the DISALLOWED set too tightly, we may regret it
>>  later. One way to avoid locking it down is to recommend that
>>  burned-in-ROM and other unupgradable software only use protocols that
>>  use LDH- and A-labels. All pieces of software, whether IDNA-aware or
>>  not, are explicitly permitted to look up Punycode labels, without
>>  decoding them to check for DISALLOWED, CONTEXT*, etc.
>>
>>  On the other hand, we should recommend that protocol and application
>>  developers only use U-labels if they are willing to make their
>>  software upgradable. They need to do this for unassigned codepoints
>>  anyway. So we might as well allow for the possibility of moving some
>>  characters from DISALLOWED to other categories (if and when we
>>  determine that they should be moved, having come up with better
>>  criteria for use in IDNs, more information, clamoring users, etc).
>>
>>  If we allow for this possibility, we don't need to fret so much about
>>  historic scripts right now. Just dump them in DISALLOWED for now, and
>>  deal with them later, if they ever need to be dealt with.
>>
>>  Erik
>>
>>
>>
>>
>>  On Thu, May 8, 2008 at 1:59 PM, Shawn Steele <Shawn.Steele at microsoft.com>
>> wrote:
>>  > Erik wrote:
>>  >
>>  >  > This also neatly solves the problem of whether or not IDNA-unaware and
>>  >  > IDNA-aware clients are allowed to look up labels with Punycode in
>>  >  > them. They should always be allowed to do so. Only software that tries
>>  >  > to convert from U-labels to A-labels needs to be restricted. This is
>>  >  > how we can achieve the most reasonable level of interoperability, in
>>  >  > my opinion.
>>  >
>>  >  I think that conversion U to A conversion does NOT need restriction.
>> Assuming that the steps in conversion include NFKC or appropriate mappings,
>> then if a character moves from disallowed to allowed, the conversion is
>> already known.  So no change is required for lookup, even if conversion is
>> required. The only change would be the software the decides the legality of
>> the name, which, IMO could be at a different layer.
>>  >
>>  >  - Shawn
>>  >
>>  >
>>  >
>>  >  _______________________________________________
>>  >  Idna-update mailing list
>>  >  Idna-update at alvestrand.no
>>  >  http://www.alvestrand.no/mailman/listinfo/idna-update
>>  >
>>  _______________________________________________
>>  Idna-update mailing list
>>  Idna-update at alvestrand.no
>>  http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>>
>>
>>
>>  --
>>  Mark
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>


More information about the Idna-update mailing list