Update: language-subtag-registry still broken

Tue Feb 28 06:40:52 CET 2012

On Fri Feb 17 16:08:52 CET 2012, I wrote:

> Akira TAGOH <akira at tagoh dot org> wrote:
>
>> I'm assuming that language-subtag-registry is encoded by utf-8
>> though, it's now broken because of the nulik subtag that has been
>> added recently.
>>
>> I hope it will be corrected shortly.
>
> Both 'nulik' and 'rigik' were apparently encoded in MacRoman instead
> of UTF-8. IANA has been notified.  Thanks for spotting this.

IANA "corrected" this by transcoding the entire Registry—not just the
new entries—from MacRoman to UTF-8, so that previously valid fields
like:

Description: Norwegian Bokmål

where the 'm' is followed by U+00E5 (C3, A5 in UTF-8), have been turned
into:

Description: Norwegian Bokm√•l

where the UTF-8 sequence (C3, A5) was interpreted as MacRoman (U+221A,
U+2022) and converted to (E2, 88, 9A, E2, 80, A2).

*Every* non-ASCII sequence in the Registry has been corrupted in this
way, except the Description fields for the 'nulik' and 'rigik' records,
which were originally encoded in MacRoman and are now fine.

The Registry has been like this for over a week now, and we have been
unable to contact IANA to get it corrected. Until this problem is
resolved, I've posted a corrected version at
http://ewellic.org/language-subtag-registry.txt . Please be aware that
this is not the official Registry, and I'll be taking it down as soon as
IANA makes the necessary corrections.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell