Region subtags under 3066 and 3066bis (long)

Sun Feb 20 00:38:20 CET 2005

Doug Ewell wrote:

> The rules for using a UN numeric code are clearly stated in
> the draft

Whenever you say "clear" and "consistent" I read "some fancy
rules made up as needed using at least two cut-off dates".  The
second date is clear, it's the day of the publication of a new
3066bis as RfC.

The first date is less clear, you want apparently "3166:3" with
country codes like BQ, BU, CT, DD, FQ, FX, JT, MI, NQ, NT, PC,
PU. PZ, RH, SU, VD, and YD.  The literal interpretation of...

| All 2-letter subtags are interpreted as ISO 3166 alpha-2
| country codes from [ISO 3166], or subsequently assigned by
| the ISO 3166 maintenance agency

...together with...

| [ISO 3166]  ISO 3166:1988 (E/F)
[...]
| Standardization, 3rd edition, 1988-08-15.

Two questions, is it really necessary to stick to an _obsolete_
edition of ISO 3166 for 1766/3066-compatibilty ?  And if your
source says, that RHZW was changed 1980 eight years before the
third edition of ISO 3166, why add it to a _new_ registry about
language tags in 2005 ?  RH was never allowed under 1766/3066.

 [UN codes]
> the 30 UN numeric codes that refer to geographical regions
> are in the registry, but the ones that denote economic
> groupings are not.  This is by rule in the draft (Section
> 2.2.4 in draft-09).

You have 200 for the former CS, is that a third cut-off date ?
Otherwise the UN part is clear and I have no problem with it.

 [alpha-3 vs. alpha-2]
> Again, the draft is very clear on this point.

Okay, I did't know that ISO promised to never add an alpha-2
code to an existing alpha-3 code.

> This concept was actually introduced with RFC 3066

Makes sense, so you essentially copy all alpha-3 codes without
alpha-2 alias to your alpha-3 section of the registry.

> I got my historical data from Clive Feather's page at
> http://www.davros.org/misc/iso3166.html

Thanks, that's a nice page, we should be able to fix your list
with this data.  It says BQAQ -v with v = changed 1979.  That
was long before 1988 and should kill BQ.  In that list you also
find NQAQ -x (1983) killing NQ.  For FQ it's FQHH 1979 killing
FQ.  ISO 3166-3 uses HH if there is more than one new code, in
that case FQ is covered by both TF and AQ.

You could simply remove all codes from your list changed before
1988 (-b and -t up to -z in your source).  In other words:

BQ, CT, DY, FQ, HV, JT, MI. NH, NQ, PC, PU, PZ, RH, VD, and WK
are dead.  Like AIDJ, GEHH, SKIN, and you already have the new
AI, GE, and SK, that's okay.  Maybe we can agree on this part ?

The same source also says BUMM, DDDE, TPTL, YDYE, YUCS, ZRCD.
You have MM for BU etc. but not DE for DD and YE for YD, that's
not yet consistent.  Please ignore the BYAA, BY is okay.

> the rationale for using NH in examples was precisely to
> demonstrate the use of no-longer-active ISO 3166 codes in
> language tags.

Good idea, bad example, if we agree on killing the old 1980 NH.

A better example could be TP (especially for me, because I tend
to mention existing ccTLDs after the ISO-3166-CS-mess... ;-)

> code X is considered a canonical equivalent of code Y
[...]
> if they represent the exact same entity.

If ISO 3166-3 has XXYY (minus XXAA or XXHH) then that should be
good enough for your list.  Otherwise you have the problem that
YU used to be more than only CS for some decades, including
some years after 1988.  It only affects DDDE and YDYE after
removing all obsolete codes like the three ??UM from your list.

> Region subtags do not have the canonical/alias relationship
> if they do not represent the same plot of land.

Later you said that we should follow ISO 3166 where possible,
and they have DDDE and YDYE.

> FQ is not exactly the same as TF.

As far as languages are concerned FQHH with a comment AQ and TF
_is_ the same as TF, because there are no "regional languages"
in AQ.  But FQ is one of the obsolete codes, we don't need to
discuss the details if you you just delete it.

> It's important that we have explicit rules to determine what
> codes are used, and why

Sure, and therefore it's an extremely bad idea to "block" codes
like FQ and NH with obsolete stuff, when they could be used for
something else in the future.  See the old AI, CS, GE, and SK.

> what would happen if we actually *did* decide arbitrarily,
> case by case, which regions should have codes and which
> should not?

No idea.  And it's not your fault that the RfC 1766 concept of
adding country codes to language tags for "regional languages"
is FUBAR.  Some RfCs are worse than others or even dead ends.

The problems of matching en-boont with en-US-boont would just
go away if 3066bis would deprecate this "country code" madness.
If I'd want en-"TX" then I need "en-texan" and not en-US. let
alone en-UM.

> They should not pose any problems to anyone, as long as
> people Tag Content Wisely.

Again, look at AI, CS, GE, and SK.  Keeping obsolete codes for
obscure consistency reasons would have caused major trouble for
the new AI, CS, GE, and SK.

> If you have content tagged as "fr-FQ", and don't get the
> Google hits that you would have gotten if you had used
> "fr-TF" instead, the sky probably will not fall.

The sky will fall on a _future_ FQ if they can't use their own
country code like almost everybody else in the world, because
you decided that it stands for an uninhabited territorial claim
in AQ acknowledged by neither the UN nor the US.

>> The deprecated NT is useless in a registry about languages.
> Lots of region subtags have no bearing on languages.

Yes, I've now seen it, it's not just "a" neutral territory, it
was "the" neutral territory between Iraq and Saudi Arabia.

>> Is i-default different from "und" (alpha-3) ?
> I assume so.  See the registration form for i-default.

It sounds like the same idea, a dummy if you're forced to use a
tag.  I skip the other i-grandfather issues, because we agree.

>> AC / BM / CP / DG / FK / GB / GG / GI / GS / IM / IO / JE /
>> KY / MS / PN / SH / TA / TC / UK / VG.
[...]
> I'm lost here.

The British empire and the duchy of Normandy enumerated by all
country codes plus AC, CP, DG, GG, IM, JE, TA, and UK, only a
joke. ;-)

> The draft uses the codes that it uses, consistently.

Adding the few additional ccTLDs would be also consistent.  It
would be just a different rule  IMHO better than 830 and 833.

> say "we should (or shouldn't) allow non-ISO-3166 ccTLDs"

Yes, that's the idea.

> "we should (or shouldn't) allow ISO 3166 codes withdrawn
> before 1995,"

That's also fine.  Or for literal 1766/3066-compatibility maybe
1988, because the reference mentions ISO-3166:1988.  Bye, Frank