Region subtags under 3066 and 3066bis (long)

Sun Feb 20 02:20:31 CET 2005

Frank Ellermann <nobody at xyzzy dot claranet dot de> wrote:

> Whenever you say "clear" and "consistent" I read "some fancy
> rules made up as needed using at least two cut-off dates".  The
> second date is clear, it's the day of the publication of a new
> 3066bis as RfC.

I'm not sure which "two cut-off dates" you have in mind.  2005-01-01 is
the only cutoff date I see.  That could certainly be changed to the date
of publication (hopefully 2005-xx-xx), as long as no ISO codes are
reused between now and then.

But what "first" date as you referring to?  ISO 3166-1 began in 1974.
The UN site lists codes going back to 1982, so I assume that's when that
standard begin.  RFC 1766 was published in March 1995, RFC 3066 in
January 2001.  All of these are "the dawn of time" as far as each
publication is concerned.

Which rules do you see as "fancy" or arbitrary?

> The first date is less clear, you want apparently "3166:3" with
> country codes like BQ, BU, CT, DD, FQ, FX, JT, MI, NQ, NT, PC,
> PU. PZ, RH, SU, VD, and YD.  The literal interpretation of...
>
> | All 2-letter subtags are interpreted as ISO 3166 alpha-2
> | country codes from [ISO 3166], or subsequently assigned by
> | the ISO 3166 maintenance agency
>
> ...together with...
>
> | [ISO 3166]  ISO 3166:1988 (E/F)
> [...]
> | Standardization, 3rd edition, 1988-08-15.
>
> Two questions, is it really necessary to stick to an _obsolete_
> edition of ISO 3166 for 1766/3066-compatibilty ?

As I said before, the draft should contain an updated reference to ISO
3166, and should explicitly mention the use of withdrawn codes.

> And if your
> source says, that RHZW was changed 1980 eight years before the
> third edition of ISO 3166, why add it to a _new_ registry about
> language tags in 2005 ?  RH was never allowed under 1766/3066.

As I said before, for consistency with the principle of allowing more
recently withdrawn codes such as TP and YU.

You are saying that the draft should disallow all codes withdrawn before
a specified date, perhaps 1988-08-15 (publication date of ISO 3166 3rd
edition) or 1995-03-01 (publication date of RFC 1766).  That is the
first of two cutoff dates you were referring to, I guess.

In any case, that seems a legitimate topic for discussion.  Should we
set some starting date, such that ISO codes withdrawn before that date
don't go into the registry?

If we did this, there might be a problem with ISO 639 language codes.
Even though in for Indonesian and iw for Hebrew and ji for Yiddish were
deprecated way back in 1988, users were advised for YEARS AND YEARS
afterward to use the old codes in tagging their content rather than the
new ones, because "software would be more likely to recognize them."
This seems silly to me, but it is a fact, and it would bear on the
question of whether these codes could be assumed not to exist in
language tags simply because they were deprecated in ISO 639.

> You have 200 for the former CS, is that a third cut-off date ?
> Otherwise the UN part is clear and I have no problem with it.

200 is used for Czechoslovakia because CS was taken by Serbia and
Montenegro.  There is no date associated with this, other than the one
and only 2005-01-01 cutoff date that says CS has its new meaning and not
its old one.

Question:
French Afars and Issas (AI), Gilbert and Ellice Islands (GE), and Sikkim
(SK) all had their codes re-used for other countries, just like CS.  Why
does Czechoslovakia get a numeric code and the others don't?  Is it
because of a cutoff date?

Answer:
No, it is because these former entities have no numeric code.

> Okay, I did't know that ISO promised to never add an alpha-2
> code to an existing alpha-3 code.

There is no entity that has one code but not the other.  This is
different from ISO 639.

> Makes sense, so you essentially copy all alpha-3 codes without
> alpha-2 alias to your alpha-3 section of the registry.

There are no such codes.

>> I got my historical data from Clive Feather's page at
>> http://www.davros.org/misc/iso3166.html
>
> Thanks, that's a nice page, we should be able to fix your list
> with this data.  It says BQAQ -v with v = changed 1979.  That
> was long before 1988 and should kill BQ.  In that list you also
> find NQAQ -x (1983) killing NQ.  For FQ it's FQHH 1979 killing
> FQ.  ISO 3166-3 uses HH if there is more than one new code, in
> that case FQ is covered by both TF and AQ.

Time out.  First we all need to sit down (figuratively) and talk about
this, and decide if it is the right thing to do.  That may not be
obvious.  If this is decided, changing the list to remove BQ and friends
is trivial.

> BQ, CT, DY, FQ, HV, JT, MI. NH, NQ, PC, PU, PZ, RH, VD, and WK
> are dead.  Like AIDJ, GEHH, SKIN, and you already have the new
> AI, GE, and SK, that's okay.  Maybe we can agree on this part ?

Certainly we have the new AI, GE, and SK (and we also have the new CS).
If BQ had been assigned a new meaning, of course we would be using that
as well.  But whether we should disallow BQ and friends has not yet been
decided.

> The same source also says BUMM, DDDE, TPTL, YDYE, YUCS, ZRCD.
> You have MM for BU etc. but not DE for DD and YE for YD, that's
> not yet consistent.  Please ignore the BYAA, BY is okay.

The rule, as I said before, is whether the new code corresponds exactly
to the same plot of land as the old code.  For BUMM it does.  For DDDE
it does not.  Does you agree?

>> the rationale for using NH in examples was precisely to
>> demonstrate the use of no-longer-active ISO 3166 codes in
>> language tags.
>
> Good idea, bad example, if we agree on killing the old 1980 NH.

Which we haven't yet.

> A better example could be TP (especially for me, because I tend
> to mention existing ccTLDs after the ISO-3166-CS-mess... ;-)

TP is an excellent example, I agree.

> If ISO 3166-3 has XXYY (minus XXAA or XXHH) then that should be
> good enough for your list.  Otherwise you have the problem that
> YU used to be more than only CS for some decades, including
> some years after 1988.  It only affects DDDE and YDYE after
> removing all obsolete codes like the three ??UM from your list.

There is no way around the fact that an entity can change boundaries
while its ISO 3166 code remains the same.  To solve this problem
ourselves, using YU as an example, we would have had to discontinue use
of YU back in 1991 or whatever when Slovenia broke away, and start using
a different code for the new, smaller Yugoslavia.  Then what would we do
if other republics broke away at different times?  (Which is exactly
what happened.)  How many new codes would we have had to invent?

It cannot be said that DD is equivalent to DE.

> Later you said that we should follow ISO 3166 where possible,
> and they have DDDE and YDYE.

To me, that is not an instance of "where possible."  But again, this can
be discussed.

>> FQ is not exactly the same as TF.
>
> As far as languages are concerned FQHH with a comment AQ and TF
> _is_ the same as TF, because there are no "regional languages"
> in AQ.  But FQ is one of the obsolete codes, we don't need to
> discuss the details if you you just delete it.

As far as language usage is concerned, many language + country tags are
essentially the same as many others.  Trying to decide which language +
country combinations truly represent different languages is a real
challenge (ask Tex Texin) and out of scope.

> Sure, and therefore it's an extremely bad idea to "block" codes
> like FQ and NH with obsolete stuff, when they could be used for
> something else in the future.  See the old AI, CS, GE, and SK.

We would really like to have ISO 3166/MA solve this problem by not
reassigning codes in the first place.

> No idea.  And it's not your fault that the RfC 1766 concept of
> adding country codes to language tags for "regional languages"
> is FUBAR.  Some RfCs are worse than others or even dead ends.

RFC 1766, and 3066 after it, have been extremely successful.  Even if
you have a better solution than ISO 3166 for distinguishing German
German from Austrian German from Swiss German, such as UN numeric codes,
the toothpaste is out of the tube now.  We can't deprecate RFC 1766/3066
tags that use ISO 3166 codes.  They are everywhere.

> The problems of matching en-boont with en-US-boont would just
> go away if 3066bis would deprecate this "country code" madness.
> If I'd want en-"TX" then I need "en-texan" and not en-US. let
> alone en-UM.

There isn't a problem anyway, if you disassemble en-US-boont into its
component parts and perform a match based on the parts.  Admittedly,
existing (non-3066bis-aware) software doesn't do this, but that does not
make it the intractable problem that some have claimed.

>> They should not pose any problems to anyone, as long as
>> people Tag Content Wisely.
>
> Again, look at AI, CS, GE, and SK.  Keeping obsolete codes for
> obscure consistency reasons would have caused major trouble for
> the new AI, CS, GE, and SK.

They would have been assigned numeric codes 660, 891, 268, and 703
respectively.  Whether that constitutes "major trouble" is left as an
exercise.

>> If you have content tagged as "fr-FQ", and don't get the
>> Google hits that you would have gotten if you had used
>> "fr-TF" instead, the sky probably will not fall.
>
> The sky will fall on a _future_ FQ if they can't use their own
> country code like almost everybody else in the world, because
> you decided that it stands for an uninhabited territorial claim
> in AQ acknowledged by neither the UN nor the US.

If ISO 3166/MA disregards the will of the world and reassigns FQ, it
will be to a new entity defined by UN.  That entity will have been
assigned a UN M.49 numeric code, RFC 3066bis will use that, and the sky
will stay right where it is.

>> The draft uses the codes that it uses, consistently.
>
> Adding the few additional ccTLDs would be also consistent.  It
> would be just a different rule  IMHO better than 830 and 833.

I have to disagree quite strongly about UK, at least.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/