looking up domain names with unassigned code points

Vint Cerf vint at google.com
Sun May 11 16:36:17 CEST 2008


I think we should say nothing about display. John's focus is on  
whether and how to do the lookup.

I agree with what I understand his two positions to be:

1. just put the punycode string into the DNS query opaquely.

OR

2. do the conversion and handle as if the resulting Unicode had been  
submitted.

technical question:

if someone generates an arbitrary  string of the form "xn-- <random  
sequence of lowercase a-z, 0-9 and hyphen>
does the algorithm ALWAYS produce a sequence of UNICODE code points?  
Note I did not say a PVALID set of code points or even ASSIGNED.

I am asking because I am wondering how a relatively simple-minded  
implementation might look from the UI perspective.

If we always get a sequence of code points regardless of the sequence  
of LDH, the simple-minded implementation could easily produce  
gibberish if attempting to invert to UNICODE a sequence of random LDH  
characters (confining the letters to lowercase)

Is the following correct:

let s be a random string of <lower case a-z, 0-9, hyphen> prefixed by  
"xn--"

let To UNICODE be a function that maps s into UNICODE

let To ASCII be a function that maps UNICODE into punycode

s is valid punycode If and Only If s = To ASCII ( To UNICODE  (s) )

I hope I haven't mangled the question too badly.

v





More information about the Idna-update mailing list