UTF-8

Nick Teint nick.teint at googlemail.com
Sat Jun 26 11:57:38 CEST 2010


2010/6/18 Shawn Steele <Shawn.Steele at microsoft.com>:
> That's nearly impossible to guarantee one way or the other.  http for example.  In http, shoving UTF-8 where it wasn't expected in an http request might not "work", however shoving punycode into the slot pretty much requires that someone be able to compare the punycode with the U-label and see if it’s the same.  Both approaches are likely broken in some cases, both might work in some cases.

Using UTF-8 also requires that someone be able to compare the U-label
with a Unicode string and see if it's the same. With Unicode,
equivalence is not as trivial as with ASCII.

IDNAbis moves the bigger part of this problem to the mapping and thus
to the user agent. However, section 5.4 of the protocol document does
mandate some checks, which imply some minimum standards in the mapping
step.

These minimum standards are missing from just-use-UTF-8.

> It's not just UI, ACE is a cascading reaction, and then it leaks into places that were UTF-8/Unicode aware, so some place that already worked just fine with Unicode names has to make a change to realize that ACE is the same form, even though they may not have needed a change.
> ACE has "broken" almost everything here, even though ACE nominally shouldn't be a problem.  Those breaks are more ironic as most of those broken pieces already worked with Unicode.

Unfortunately, "already worked with Unicode" often means: Well, it's
8-bit clean and you can shove UTF-8 through. Such an approach is prone
to fails in edge cases.
For example, a user input method might produce NFD (instead of NFC) or
compatibility equivalents (e.g., U+00B5 – MICRO SIGN), etc.
Suddenly, it does not work, and debugging these cases is hard as you
can't see the difference.

Even worse are systems that do support UTF-8 (or other Unicode
encoding) but do things a bit differently. They might "just work" 95%
of the time.

With ACE, you have to think about correct normalisation as a software
engineer. And chances are that you get it right instead of getting
away with a solution that works "most of the time".

NT


More information about the Idna-update mailing list