A bunch of CS (was: Re: New Last Call: 'Tags for Identifying Languages' to BCP)

Doug Ewell dewell at adelphia.net
Thu Dec 16 09:32:20 CET 2004


Anyone who saw the classic movie "American Graffiti" knows exactly what
"CS" stands for.  But for those of us working with language tags, the
answer might not be so obvious.

Bruce Lilly <blilly at erols dot com> wrote:

>> That's what's now being fixed.
>
> No the problem will remain. Currently sr-CS has a specific
> meaning under RFC 3066; it has had for some time.  For that
> meaning to remain stable, it will be necessary to take any
> change in the (current) meaning of the "-CS" part into
> account. I.e. for a future parse of language tags to do the
> right thing, it will have to recognize sr-CS generated under
> the RFC 3066 rules per the 3066/639 definitions.

and later:

>> This is a situation we do not intend to repeat.
>
> That is precisely what would be repeated, and the problem
> would remain.  "CS" currently means "Serbia and Montenegro",
> and its use in accordance with RFC 3066 has precisely that
> meaning.  Changing "CS" to mean something else at some
> future time (if/when the proposed draft goes into effect)
> would result in at least as many different definitions as
> exist at present, and adds yet another time epoch that
> needs to be considered in order to determine the meaning
> of "CS".

Peter Constable <petercon at microsoft dot com> responded:

> The meaning "Serbia and Montenegro" was introduced relatively recently
> (a little more than a year ago), was immediately received with alarm
> by many in the IT sector. There were vain attempts to get it reversed,
> and that failure was an impetus to introduce protection against such
> changes in the revision of RFC 3066. I am not aware of "CS" being used
> in the IT sector with the new meaning, though cannot guarantee that.

All right, so the basic problem we have is figuring out what "CS" means.
We know that it *used* to mean Czechoslovakia, at least until 1993 as
far as ISO 3166/MA is concerned, probably longer in legacy data, and
that it now *means* Serbia and Montenegro, although some people (exactly
how many, we don't know) have resisted using that reassignment.

CS is not like the other codes that ISO 3166/MA has reassigned:

    AI -- formerly French Afars and Issas -- now Anguilla
    GE -- formerly Gilbert and Ellice Islands -- now Georgia
    SK -- formerly Sikkim -- now Slovakia

These codes were withdrawn by ISO 3166/MA *long* before that standard
was widely used for tagging data.  There was no domain naming system
that used these codes with their original meanings, and of course no
language-plus-country tagging of the type we deal with on this list.
Additionally, these countries (or "country-like objects") were
relatively small and less important in the data-processing environment.
(That's not intended as a value judgment, just a fact of numbers.)

In contrast, Czechoslovakia by the time of its breakup was a
comparatively significant nation, with a non-trivial Internet presence.
The ".cs" TLD was not deleted until January 1995, just two months before
the publication of RFC 1766.  According to Wikipedia, ".cs is the most
heavily used top-level domain ever to be deleted. Statistics from the
RIPE Network Coordination Centre show that even in June 1994, after much
of the conversion to .cz and .sk had been done, .cs still had over 2,300
hosts. By comparison, other deleted TLDs (.nato and .zr) may never have
reached double figures."

Because of the scope of the change, the reassignment of CS was in many
ways an unprecedented event for ISO 3166/MA.  Starting in February 2003,
the MA published an information page and four "interim reports" on their
Web site, mostly to report continuing delays in the process of selecting
the new code, until it was finally announced on 2003-07-23.  (The
alpha-3 code SCG was never a major source of controversy and is not
relevant to this discussion.)  It was clear that significant opposition
to reusing the CS code must have existed within the MA as well as
outside it.  It is worth noting that the IANA ccTLD administration has
*not* adopted ".cs" for Serbia and Montenegro, but has continued to use
the old ".yu" TLD.

CS (for Czechoslovakia) is also the only one of the four reassigned
codes to have its own corresponding UN M.49 numeric code, 200.  It is no
longer in current use, of course.

Now, Bruce says that CS currently means Serbia and Montenegro, and
technically he is correct.  But Peter is also correct that the
information technology community has not exactly greeted the
redefinition of "CS" with wild enthusiasm.  It is not at all clear that
the new use of CS in computer implementations to mean Serbia and
Montenegro is greater than the old use to mean Czechoslovakia, and in
fact it may be much less.  It is probably impossible to know for sure.
It is certain, however, that both meanings have been used in some
implementation or another.

Bruce says that if RFC 3066bis defines CS to mean Czechoslovakia, which
is the current plan, then that is an incompatibility with RFC 3066, and
points out that compatibility is one of the explicit goals of 3066bis.
However, in a way, RFC 3066 is actually inconsistent with itself, by
allowing CS to have two meanings depending on when the tag containing CS
was created.  (The only alternative is that there is only one meaning,
which has changed over the years, so that a tag that once referred to
Czechoslovakia now refers to Serbia and Montenegro.  This seems
implausible.)

The tag "sr-CS" comes up often in this discussion.  Obviously the intent
is that this is "Serbian as used in Serbia and Montenegro," not "as used
in (the former) Czechoslovakia."  But then what is "cs-CS" or "sk-CS"?
Those tags would probably refer to Czech or Slovak as spoken in
Czechoslovakia, not in Serbia and Montenegro.  So the "human-obvious"
meaning of CS may depend on the language in question.  Not so obvious
after all, is it?  And what about "de-CS"?  German could plausibly be
spoken in either of these regions.

There is no perfect solution to the CS problem.  Neither meaning exactly
reflects "the use of CS in RFC 3066," because there have been two such
uses.

There are two possibilities:

(1)  The current plan: use CS to refer to historic Czechoslovakia and YU
to refer to Serbia and Montenegro.  This is consistent with most
historic data, but not with the current ISO 3166 definitions.

(2)  An alternative plan: use CS to refer to Serbia and Montenegro and
200 (the former UN M.49 code) to refer to Czechoslovakia.  This protects
the controversial ISO 3166 reassignment of CS, and saddles historic
Czechoslovakia with a numeric code nobody has ever heard of.

Option (2) could be implemented by changing the cutoff date in Appendix
C of RFC 3066bis from 2003-01-01 to 2005-01-01 (and praying that no
other ill-advised reassignments take place in the next few weeks).

There are different viewpoints over how CS should be handled, but it
must be understood that no solution can be perfect, because CS has
already been given ambiguous meanings.  The solution that is chosen
should be the one that minimizes the negative impact.  Trying to use the
isolated and unique CS issue to cast a bad light on RFC 3066bis and
obstruct its approval would be, well, just a bunch of CS.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/




More information about the Ietf-languages mailing list