Region subtags under 3066 and 3066bis

Frank Ellermann nobody at
Wed Feb 23 16:19:53 CET 2005

Doug Ewell wrote:

>> Excl. the private use codes allowed by the 3066bis draft -
>> not necessarily a good idea.

> Why not?  What problems do you see with using them?

An "old" RfC 3066 implementation happy to recognize something
in a language tag could handle AA, QM, etc. as "erroneous", 
not necessarily with the same result as for "unknown" codes.
I don't like the concept of "country codes" in language tags,
and these "private use country codes" make a bad idea worse.

>> But it's not strictly necessary for the affected languages.
> I assume you mean the entries based on UN numeric codes; all
> of the ISO 3166-based thingies we are talking about are also
> "region" subtags.

Yes, some regions like the former YU automagically vanish from
your registry, because YU was used for the new CS for about 11
years.  You have the old CS as 200, but not the old YU as 890.
You have the old YD (720) but not the old (YE) 886, etc.

> Removing 200 is one of the issues I submitted to the authors.

It's difficult to see that it's only a pseudo-random result.

What would happen if CS splits into Serbia and Montenegro, and
Serbia picks a name allowing it to _keep_ CS ?  Without adding
Kosovo to this equation the 3066bis result could be messy.  Do
you keep the then old CS with a new ?? for Montenegro, and for
regional languages in the new Serbia there's only a UN number ?

Or do you add the old UN number for the then old CS, and all
old language tags with CS as in Montenegro have to be changed ?
The latter can't be the idea of a persistent registry, so you
probably add a number for Serbia even if its country code is CS.

Unions are also messy.  If some language tags use ??-KP and
??-KR today, and a new KR includes KP and the old KR, you get
a deprecated KP, apparently no number for the old KR, and some
old ??-KR tags lost their intended meaning, unlike old ??-KP.

Country codes in language tags considered harmful.

>> whatever GEHH 296 (KI+TV) instead of say GEKI 296 means,
>> you have the new AI, CS, GE, and SK.

> I don't understand this at all.

The new GE allows you to ignore GEHH, there's no GE region row
in your table where you could put KI in the "canonical" column.

Without the new GE your "exactly the same plot of land" rule
determined by UN numbers would apparently fail, GE 296 is now 
KI 296.  ISO 3166-3 GEHH claims that TV 798 belonged to GE 296.

> If you choose to ignore, or be "semi-compliant" with, a
> language tagging standard (small "s") that is as widespread
> as RFC 3066 is, and as any successor is likely to be, you do
> so at your own risk.

Sure, I could say that I found these codes in the ISO 3166 FAQ
#QS11 and #QS12, or better on the ISO 3166 page about ccTLDs.

In another article you said:

> We are not in the business of deciding what is and is not a
> country.

Of course you are.  You'll decide that all countries recognized
by ISO and / or the UN at some timestamp yyyy-mm-dd are valid
forever in language tags.  For those who are in this business
that has side effects after yyyy-mm-dd even it's yyyy + 50.

                      Bye, Frank

More information about the Ietf-languages mailing list