looking up domain names with unassigned code points
Erik van der Poel
erikv at google.com
Fri May 9 18:15:45 CEST 2008
I've changed the Subject. I'm not sure, but you (and Mark) may have
misinterpreted my email. In my opinion, it's a good thing that MSIE7
refuses to look up Unicode labels with unassigned code points, but
it's bad that it also refuses to look up Punycode labels that encode
unassigned code points.
There seem to be at least 2 camps with regard to the unassigned issue:
those that want to allow such lookups, so that "old" clients continue
to "work" when newly assigned code points are used, and those that
want to nudge developers in the direction of providing the right error
message: "You are attempting to access a domain name using a new
character that is not supported by this version of the software.
Please enable automatic update or go to download.example.com to update
In my opinion, the correct error message outweighs the "old" client
issue, especially if we allow clients to look up Punycode labels, no
matter what they might decode to.
I don't think we have consensus on this issue. In the past, Mark, Ken
and maybe Martin were in the "old" client camp, while John and I
appeared to be in the correct error message camp.
Now that I have proposed that clients should be allowed to look up
Punycode labels, we may have some individuals move from one camp to
the other, or to an entirely new camp?
On Thu, May 8, 2008 at 7:18 PM, Shawn Steele <Shawn.Steele at microsoft.com> wrote:
> Agreed pretty much. Sorry I haven't followed the recommendations for the
> casing, etc. rules, however disambiguating the mapping and
> Unicode<->Punycode rules from the disallowed set would "solve" (some of) the
> problems that cause IE7 to not look up unassigned characters.
> - Shawn
> From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On
> Behalf Of Mark Davis
> Sent: , 08, 2008 18:30
> To: Erik van der Poel
> Cc: Shawn Steele; idna-update at alvestrand.no
> Subject: Re: Archaic scripts
> I agree.
> On Thu, May 8, 2008 at 4:01 PM, Erik van der Poel <erikv at google.com> wrote:
> An unassigned codepoint may be assigned to an uppercase letter. So a
> piece of software that looks up purported U-labels must check whether
> it contains any unassigned codepoints. So we should recommend that
> such software be restricted (follow certain rules), in order to
> achieve interoperability. (MSIE7 refuses to look up domain names
> containing unassigned characters.)
> If we lock down the DISALLOWED set too tightly, we may regret it
> later. One way to avoid locking it down is to recommend that
> burned-in-ROM and other unupgradable software only use protocols that
> use LDH- and A-labels. All pieces of software, whether IDNA-aware or
> not, are explicitly permitted to look up Punycode labels, without
> decoding them to check for DISALLOWED, CONTEXT*, etc.
> On the other hand, we should recommend that protocol and application
> developers only use U-labels if they are willing to make their
> software upgradable. They need to do this for unassigned codepoints
> anyway. So we might as well allow for the possibility of moving some
> characters from DISALLOWED to other categories (if and when we
> determine that they should be moved, having come up with better
> criteria for use in IDNs, more information, clamoring users, etc).
> If we allow for this possibility, we don't need to fret so much about
> historic scripts right now. Just dump them in DISALLOWED for now, and
> deal with them later, if they ever need to be dealt with.
> On Thu, May 8, 2008 at 1:59 PM, Shawn Steele <Shawn.Steele at microsoft.com>
> > Erik wrote:
> > > This also neatly solves the problem of whether or not IDNA-unaware and
> > > IDNA-aware clients are allowed to look up labels with Punycode in
> > > them. They should always be allowed to do so. Only software that tries
> > > to convert from U-labels to A-labels needs to be restricted. This is
> > > how we can achieve the most reasonable level of interoperability, in
> > > my opinion.
> > I think that conversion U to A conversion does NOT need restriction.
> Assuming that the steps in conversion include NFKC or appropriate mappings,
> then if a character moves from disallowed to allowed, the conversion is
> already known. So no change is required for lookup, even if conversion is
> required. The only change would be the software the decides the legality of
> the name, which, IMO could be at a different layer.
> > - Shawn
> > _______________________________________________
> > Idna-update mailing list
> > Idna-update at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/idna-update
> Idna-update mailing list
> Idna-update at alvestrand.no
More information about the Idna-update