Solving the UTF-8 problem; was Language Tag Modification 1694acad;

Peter Constable petercon at microsoft.com
Tue Jul 3 17:07:18 CEST 2007


+1 to specifying NFC (whether we use UTF-8 or NCRs).


Peter



-----Original Message-----
From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Doug Ewell
Sent: Tuesday, July 03, 2007 7:00 AM
To: ietf-languages at iana.org; LTRU Working Group
Subject: Re: Solving the UTF-8 problem; was Language Tag Modification 1694acad;

Stephane Bortzmeyer <bortzmeyer at nic dot fr> wrote:

> But allow me a little troll: if we choose UTF-8, what about
> normalization?
>
> 1) Do not mention it (this would mean that IANA would be free to
> suddenly canonicalize the registry, thus making it different in a
> byte-to-byte comparison)
>
> 2) Mandate NFC or NFD (which means an automatic registry checker would
> have to check it)

There's actually nothing new here, since the Registry is already using
Unicode with hex NCRs as the encoding scheme, and we would just be
changing it to Unicode with UTF-8 as the encoding scheme.

However, it wouldn't hurt to specify NFC somewhere in the draft.  This
is what we are already using and what the IETF and W3C seem to prefer.
Descriptions and comments are supposed to be non-normative, so I'm not
sure any user's tools would *have* to do any checking or correcting,
though of course ours should.

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://users.adelphia.net/~dewell/
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages

_______________________________________________
Ietf-languages mailing list
Ietf-languages at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages


More information about the Ietf-languages mailing list