Region subtags under 3066 and 3066bis (long)

John Cowan cowan at
Mon Feb 21 19:48:45 CET 2005

Frank Ellermann scripsit:

> That "nobody" includes me, I didn't know that 3166 "started"
> 1974.  The public list has codes like RA, 2nd RB, RC, etc.
> with the year 1949 for the reservation, or at least that's
> how I interpreted "R49".

Which public list is this, please?

> If they say BUMM, CSHH, DDDE, GEHH, VDVN, YDYE, and YUCS, then
> they must have a reason to do so.  Obviously they use XXHH for
> 1:n relationships, and XXYY for 1:1 or m:1 relationships.  And
> BYAA is something else.
> If you insist on DD (278) without canonical DE, you would have
> to add 280 for the old DE, both parts of the new DE (276).  

This confusion is fairly easy to straighten out, once one grasps that ISO
3166 encodes the names of countries, whereas UN M.49 encodes countries
as physical-economic entities.  Thus, when Burma officially changed its
(Latin-alphabet) name to Myanmar, the ISO 3166 code changed too, but
the UN code remained the same, as it still addressed the same entity.

But when the former East German provinces acceded to the Federal Republic,
the name "German Democratic Republic", encoded DD, ceased to exist, and
the name "Federal Republic of Germany", encoded DE, continued to exist.
So the code DE was left alone.  But UNSD necessarily assigned a new
numeric code, as the substantive entity named by the old name was now
fundamentally different in such a way that statistical information
(gross domestic product, per capita income, hectares under cultivation,
you name it) was not continuously interpretable across the divide.
The Yemen story is exactly the same, mutatis mutandis.

> The proposed matching algorithm failed to match en-boont with
> en-US-boont etc.  Maybe en-*-*-boont would work, I'm not sure.

No, the language-range "*" must stand alone.  Note that the so-called
"proposed algorithm" is really the de facto algorithm, and its true
home is RFC 2616 (HTTP 1.1), section 14.4 (Accept-Language: header).
It appeared in RFC 3066 only as a non-normative explanation; and in
draft-phillips-langtags-09 (the last undivided RFC 3066bis draft) the
significant phrase "The most common implementation follows this pattern"
is used.

In my personal opinion, the algorithm should never have been put into
RFC 3066 at all, as it is productive of nothing but procedural confusion.
RFC 2616 is a sufficient reference for those who wish to implement it.

John Cowan   cowan at
Most languages are dramatically underdescribed, and at least one is 
dramatically overdescribed.  Still other languages are simultaneously 
overdescribed and underdescribed.  Welsh pertains to the third category.
        --Alan King

More information about the Ietf-languages mailing list