Solving the UTF-8 problem; was Language Tag Modification 1694acad;

Doug Ewell dewell at
Tue Jul 3 16:00:15 CEST 2007

Stephane Bortzmeyer <bortzmeyer at nic dot fr> wrote:

> But allow me a little troll: if we choose UTF-8, what about 
> normalization?
> 1) Do not mention it (this would mean that IANA would be free to 
> suddenly canonicalize the registry, thus making it different in a 
> byte-to-byte comparison)
> 2) Mandate NFC or NFD (which means an automatic registry checker would 
> have to check it)

There's actually nothing new here, since the Registry is already using 
Unicode with hex NCRs as the encoding scheme, and we would just be 
changing it to Unicode with UTF-8 as the encoding scheme.

However, it wouldn't hurt to specify NFC somewhere in the draft.  This 
is what we are already using and what the IETF and W3C seem to prefer. 
Descriptions and comments are supposed to be non-normative, so I'm not 
sure any user's tools would *have* to do any checking or correcting, 
though of course ours should.

