UTF-8

Shawn Steele Shawn.Steele at microsoft.com
Fri Jun 18 00:00:33 CEST 2010


> On the other hand I disagree: non-A-label leakage into IDN-unaware
> domainname slots (in APIs, protocols, on-disk formats) is a bad thing.

That's nearly impossible to guarantee one way or the other.  http for example.  In http, shoving UTF-8 where it wasn't expected in an http request might not "work", however shoving punycode into the slot pretty much requires that someone be able to compare the punycode with the U-label and see if it’s the same.  Both approaches are likely broken in some cases, both might work in some cases.

> in which case which is the lesser evil: ACE leakage into UIs or non-ASCII leakage into IDN-unaware domainname slots?

It's not just UI, ACE is a cascading reaction, and then it leaks into places that were UTF-8/Unicode aware, so some place that already worked just fine with Unicode names has to make a change to realize that ACE is the same form, even though they may not have needed a change.

ACE has "broken" almost everything here, even though ACE nominally shouldn't be a problem.  Those breaks are more ironic as most of those broken pieces already worked with Unicode.

For example, RFC 5280.  It had to be updated to support ACE, which was convenient, but now what do you do about the email local parts?  There's no punycode for email, so the ACE workaround in RFC5280 is temporary at best.  It'll either have to:  A) allow UTF-8, B) Allow some special variant of punycode that works for email, or C) use or invent some other encoding.  So now everything that uses 5280 has to be updated twice :(

- Shawn


More information about the Idna-update mailing list