looking up domain names with unassigned code points

Sat May 10 21:59:20 CEST 2008

--On Saturday, May 10, 2008 12:17 PM -0700 Erik van der Poel
<erikv at google.com> wrote:

> John,
> 
> Thanks for responding. I'm not sure what the right answer is,
> either. Yes, I was referring to domain names and URIs that are
> already in Punycode form, and I agree that the situation in
> which the app receives the Unicode form is very different
> (primarily because of the unassigned code point issue).
> 
> I also agree with your suggestions below. My main concern with
> simply letting apps look up domain names that are already in
> Punycode form is that some apps may also blindly convert the
> Punycode to Unicode for display, without checking for
> dangerous characters like U+2044 FRACTION SLASH. The TLD
> registries are under a certain amount of pressure to only
> register "safe" names, but at lower levels of the DNS, there
> is very little pressure and practically zero enforcement.

Yes, and that is a major concern. But see below.

> However, I don't know how comfortable you and others in the
> working group are about writing advice regarding display
> issues in the IDNA200X RFCs.

I think we get out into dangerous territory if we give more than
general advice about display and I think some will argue that we
should not do even that.   But I don't see that as an issue in
this case.

There will be an issue in getting the wording right (for which I
will certainly need help).   But I think that "MAY treat the
putative A-label as opaque" rule can be written to give the
implementation a choice between opaque or not.  So, e.g.,

	* If you decide to treat it as opaque, you look it up
	without inspecting its contents but don't, ever, convert
	it to a U-label.

	* If you do decide to convert it to a U-label, then it
	isn't opaque, it must be valid as a U-label (and hence
	as an A-label).  Obviously, if it contains DISALLOWED or
	UNASSIGNED characters, or even CONTEXT-required
	characters that don't follow whatever rules need to be
	followed for looking, then you need to treat it as
	invalid for lookup and tell the user whatever you tell
	the user under such circumstances.

Note that the issue here isn't above display, it is about valid
conversions between things that are supposed to be A-labels and
the corresponding U-labels.

Does that model help?

    john