Region subtags under 3066 and 3066bis (long)

Sun Feb 20 19:02:26 CET 2005

Frank Ellermann <nobody at xyzzy dot claranet dot de> wrote:

> The 3066bis draft-08 just failed in its "last call", now it's
> split into two drafts.  So that's not yet ready.  You probably
> want to use the date of the publication as one of the future
> cut-off dates, because that's essential for compatibility with
> RfC 3066.  2005-01-01 isn't one of the relevant cut-off dates.
> ...
> _Especially_ if some ISO codes are changed, because the changes
> affect users of RfC 3066 immediately.  Let's say 200x-xx-xx to
> be sure.

I don't want to quibble about this.  Let's suppose, for the sake of
argument, that RFC 3066bis (in whatever form it takes, probably two or
more documents) is approved as of 2005-08-01.  Fine, I'm open to
discussing changing the date to 2005-08-01.  I'll talk to the authors.
Next.

> Whatever you do, don't pick 1974.  A very conservative choice
> justified by RfC 3066 is apparently 1988-08-15.

Nobody "picked" 1974 for anything.  That is the date of birth of ISO
3166.  The principle was to allow all ISO 3166 codes that were ever
assigned, since the beginning of time, except those that had been reused
(because that would be impossible).  "The beginning of time" for ISO
3166 happens to be December 1974.

If you want to discuss setting up a cutoff date, such that ISO 3166
codes withdrawn before that date are not valid in language tags, then
let's talk about it.  This is a philosophical change and not just a
matter of this arbitrary date versus that one.

>>> RH was never allowed under 1766/3066.
>>
>> As I said before, for consistency with the principle of
>> allowing more recently withdrawn codes such as TP and YU.
>
> These cases are bad enough, adding obsolete codes like RH only
> makes it worse for a future RH etc.

Are you reading what I'm writing?  The great likelihood is that there
will not BE a future RH, or a future BQ, or any of these.

>  [3166:1988]
>> If we did this, there might be a problem with ISO 639
>> language codes.
>
> No, they promised to be sensible, it's a different standard.

ISO 3166/MA seems to be trying.  The latest draft revision proposes
reserving withdrawn codes for 50 years before reusing them (cf. 5 years
in the current standard).

>> users were advised for YEARS AND YEARS afterward to use the
>> old codes in tagging their content rather than the new ones,
>> because "software would be more likely to recognize them."
>> This seems silly to me, but it is a fact
>
> Backwards compatibility isn't silly.  Changing codes without a
> compelling reason is silly.  And it's the point of your future
> registry with its persistent entries.

The registry guards against future code changes, which we otherwise have
no control over.

> But you don't need backwards compatibilty with something which
> was never valid under RfC 1766 and RfC 3066 like the former RH.

OK, I've agreed that we can talk about setting a starting point.  Let's
move forward.

>> 200 is used for Czechoslovakia because CS was taken by Serbia
>> and Montenegro.  There is no date associated with this, other
>> than the one and only 2005-01-01 cutoff date that says CS has
>> its new meaning and not its old one.
>
> But on 2005-01-01 there was no valid code 200 in the UN list,
> it was removed 1993-01-01, as stated on:
> <http://unstats.un.org/unsd/methods/m49/m49chang.htm>

You're right.  200 is a "previously used code" in UN M.49.  It was the
only alternative for encoding the former Czechoslovakia, which there is
a perceived need to do, short of inventing our own code, which we most
certainly don't want to do.

> It's not that I'm completely against it.  I only want to know
> why you have 200 (the former CS, now CZ and SK) but not 582
> (the former PC, now PW, FM, MH, and MP).  I proposed to remove
> the obsolete PC, depending on your reasons that could justify
> to add 582.  Or to get rid of 200, because it's old like 582.

We have the old PC because there is no new PC.
We *don't* have the old CS because there *is* a new CS.

> The old AI is now DJ, therefore you don't need 262.
> The old GE is now KI, therefore you don't need 296.

And the old CS is two countries, so the canonical/alias relationship
doesn't apply.  Just as I wrote earlier.

> The old SK is now a part of India (356).  Apparently that was
> before they started with their numbers, because there is no
> old number for Sikkim.  Your SKIN source says 1975.

According to the UN page... as I wrote earlier... that standard seems to
begin in 1982.  So no, they would not have had a code for Sikkim.

> Oops, and that
> source says GEHH 296 claiming that the successors are KI
> (296) _and_ TV (798).  Beats me.  But I proposed to remove GE.

The old GE is not in the registry.  The GE that is in the registry means
Georgia, and I do not think you propose to remove that.

> No.  CS is not exactly the same plot of land as YU before 1992.

The code was changed in 2003, not in 1992.  The code change from YU to
CS had nothing to do with the plot of land.  It had to do with the
*name* of the country.  ISO 3166 is a standard for encoding the *names*
of countries, not for encoding the countries themselves (unlike UN
M.49).  Perhaps it can be perceived as a flaw that RFC 1766 used it to
denote countries, but it is a fact of life.  We do not propose to undo
the entire RFC 1766/3066 mechanism and start over.

> You accepted YUCS but not DDDE, VDVN, or YDYE.  Either follow
> ISO 3166 or trash it completely.  At least for DD there's no
> problem if you say that all their languages are now used in DE
> (de, nds, dsb, hsb, de-1901, de-1996). OTOH fy-DE is ambiguous
> (one of two fy which are not fy-NL), fy-DD would make no sense.

Fine, I will propose that we reopen this subject.  Moving on.

>> We can't deprecate RFC 1766/3066 tags that use ISO 3166
>> codes.  They are everywhere.
>
> The problems start if you want to add another dimension like
> scripts.  Two dimensions language + region somehow work, but
> 4 dimensions language + script + region + variant are a PITA.
>
> As in en-boont, en-Latn-boont, en-US-boont, en-Latn-US-boont.

I've seen this claim before.  What exactly is the problem?

> A concept of regions restricted to "country codes" also isn't
> very convincing for some bigger countries like the US or GB, if
> you (ab)use it for languages.

It is well understood that language variations within a country occur,
and conversely, that the exact same language (close enough for tagging
purposes) may exist in multiple countries.  RFC 1766 and 3066 and
3066bis are not perfect lingustic tools for documenting this.  But
language variations within a country are one of the motivations for
adding variant subtags, which you didn't seem to like.

>> If ISO 3166/MA disregards the will of the world and reassigns
>> FQ, it will be to a new entity defined by UN.  That entity
>> will have been assigned a UN M.49 numeric code, RFC 3066bis
>> will use that, and the sky will stay right where it is.
>
> Wait a moment, I certainly love to bash ISO 3166, but it's not
> their problem if _you_ revive obsolete codes which have been
> removed decades ago like FQ.  And the population of a future FQ
> will hate the obscure UN number and you, when everybody else
> (maybe minus GG, IM, and JE) has "real" alpha-2 country codes.

I have no interest in bashing ISO 3166.  I trust that the public outcry
over the reuse of CS has sent a clear message to the MA, that reusing
codes -- especially one previously assigned to an economically
significant country, after only 10 years -- is a bad idea.  I trust that
they hear the message.

I'll accept the possibility that the population of a future FQ will end
up hating me.  As they say in Hollywood, there's no such thing as bad
publicity.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/