looking up domain names with unassigned code points

Sun May 11 18:58:07 CEST 2008

Hi Vint,

On Sun, May 11, 2008 at 7:36 AM, Vint Cerf <vint at google.com> wrote:
> I think we should say nothing about display. John's focus is on whether and
> how to do the lookup.

I think it's fine for the protocol document to focus on lookup (and
registration). But let's not forget that the bidi draft has a heavy
focus on display issues (and that's OK).

> I agree with what I understand his two positions to be:
>
> 1. just put the punycode string into the DNS query opaquely.
>
> OR
>
> 2. do the conversion and handle as if the resulting Unicode had been
> submitted.

I think John's proposal is a good one. It might have other
ramifications, or we might need to tweak it further, but for now, I
don't see any problem.

> if someone generates an arbitrary  string of the form "xn-- <random sequence
> of lowercase a-z, 0-9 and hyphen>
> does the algorithm ALWAYS produce a sequence of UNICODE code points?

No. For example, xn--en32g would produce U+110000, which is outside
the range of valid code points. (The highest code point is U+10FFFF.)

If an app receives such a punycode string, it should not attempt to
display the corresponding Unicode (since it is invalid). I'm guessing
that we can all agree on that. :-)

Erik