my digression on UTF-8 - was Changing the xn-- prefix -

Shawn Steele Shawn.Steele at microsoft.com
Tue Mar 25 00:22:08 CET 2008


Sorry, my digression is a bit randomizing, so I split the thread to follow up on John's other comments.

> (i) The phishing and race condition problems with the DNS are
well-known today.  Switching from punycode to UTF-8, or having
made a UTF-8 choice in the first place, would not change them in
any qualitative way.

My hope would be that phishing and other problems could be separated from the encoding problems.  In fact IDN does that now since the punycode and stringprep are different RFCs.

> (ii) .. and significant work on the server.

The Internationalized Mail for Applications effort also requires significant work, but its more compatible and forward looking.

> (iii)  While it is efficient for ASCII and most western/northern
alphabets, UTF-8 is arguably pathological for East Asian scripts

That's a common argument, but it doesn't stand up.  Certainly the strings are short enough that the minor difference between them and xn-- encoding is small.  German has arguably long words, and it is probably smaller in UTF-8 than Punycode for many forms.  IRIs and various other representations support Unicode names, and, as you pointed out it'd require significant server side work anyway, so if you had to extend the size, then would could do so.

> (iv) (validation is required anyway,) so just changing the encoding from punycode to UTF-8
really would not accomplish much.

I wasn't trying to say it would remove the validation steps, but it would solve the encoding problem without further versions asking about a new prefix again.

> Finally, if your argument against a change in prefix is to avoid
heuristics about what to do to find a given string, changing to
UTF-8 encoding would be at least as bad, and probably worse,
than going to a different ACE prefix.

But then we'd be done with it :-)  I'm saying that if we're going to do something that breaking, then do it right, don't just extend a hack.  At the very least we wouldn't have the UI facing side problem of whether or not to expose the punycode to the user.

- Shawn


More information about the Idna-update mailing list