Sample IANA language subtag registry

Mark Davis mark.davis at
Thu Jul 8 16:56:58 CEST 2004

I would agree with you IF the respective authorities

(a) maintained version control,
(b) publicized data in a way to allow a simple mechanical way to enumerate
all the codes that were valid in a particular version,
(c) and provided stable codes.

But they don't, so I don't. As it stands, I don't know of anyone who
validates RFC 3066 codes in any reasonable fashion, because it is currently
both difficult and pointless. Even if I go through the effort of assembling
the codes from the different standards, and use them to validate a code, I
don't know if that is going to match the validation that someone else uses,
because they might gather the data on a different date. And because the ISO
codes are unstable, a code that validates now might not validate in the
future -- or worse yet, might validate but mean a different thing!

If you try to maintain stability, there is not a mechanical way to gather
the data; you have to take the current snapshot, and then walk back through
the change history to figure out what changed -- and when; you can't just
download a file of all the codes valid on a given date (or for a specified
version -- that would be even better). Then you have to put this together
from multiple sources: the two registration authorities (three, once scripts
are added) plus the IANA registry. You finally get a chunk of data that you
can use to validate, though a clumsy process with many possible points of

For these reasons and others, we followed the path suggested by Doug's
earlier comments; it is by far best to simply provide all of the data in
*one* place that:

(a) allows any user of RFC 3066bis to determine precisely whether a tag is
valid or not,

(b) provides for absolute stability
 - once a tag is valid, it is valid from that point on
 - the canonical form is also stable: if he-IL is the canonical form of
iw-IL now, it will always be in the future

We considered the work involved in synchronization, and decided that since
we needed the registry to be updated in any event with each change in the
ISO standards (for stability), it would be far better to go ahead and
assemble the complete list from the start. Once that is done, the
incremental work is the same in either case.

The other advantage that RFC 3066bis brings is 'future-proofing'; because
the structure is more clearly defined, when I get a tag from some up-version
of the standard, I can still parse out the structure to determine whether it
is well-formed or not, and determine what the individual pieces are supposed
to designate.


----- Original Message ----- 
From: "Peter Constable" <petercon at>
To: <ietf-languages at>
Sent: Wednesday, July 07, 2004 17:02
Subject: RE: Sample IANA language subtag registry

> From: ietf-languages-bounces at [mailto:ietf-languages-
> bounces at] On Behalf Of Doug Ewell

> Section 3.2 of draft-phillips-langtags-04 describes the format of the
> IANA language subtag registry, which would be a normative part of RFC
> 3066bis.
> This registry is to be assembled by the language subtag reviewer, but
> get a jump on implementation and to solidify my understanding of
> conformance issues, I've gone ahead and created my own copy of what
> registry might look like...

> This is intended as a sample of the final registry, and at present
> contains only the standard codes:
> * ISO 639 alpha-2 and alpha-3 language codes
> * ISO 15924 alpha-4 script codes
> * ISO 3166 alpha-2 and UN M.49 numeric region codes
> and no entries yet for registered subtags or grandfathered whole-tags.

Whoa! Back up the truck!

In the past, it has not be necessary to register entire tags that were
considered as given because they consisted of combinations of ISO 639
and ISO 3166 IDs. It has only been necessary (and expected) to register
complete tags (other than x-...) in case they do *not* consist of
combinations of existing ISO 639 and ISO 3166 IDs.

Under the terms of the new RFC the only subtags that should need to be
registered are the language IDs that are not in ISO 639, or variant IDs.

There is no need to create a registry of existing ISO 639 IDs, ISO 3166
IDs, or ISO 15924 IDs.

We should not be providing lists that mirror the code tables for those
standards. The *only* reason to provide a list of ISO 639 IDs or ISO
3166 IDs, etc., would be if we explicitly wanted to limit the accepted
values from one of those sources (such as I suggested at one point that
we do for ISO 639-1). If we simply mirror what is published elsewhere,
then we will inevitably create synchronization problems. And there is no
reason to provide lists of IDs along with names that have been
normalized to some constraint, such as using only ASCII letters. We are
not providing a standard set of names for language, countries and

Sorry, Doug, but this time I think the work you have done is badly
misguided (a rare occurrence).


Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
Ietf-languages mailing list
Ietf-languages at

More information about the Ietf-languages mailing list