New Last Call: 'Tags for Identifying Languages' to BCP

Mon Dec 13 06:14:03 CET 2004

>  Date: 2004-12-12 20:57
>  From: "Peter Constable" <petercon at microsoft.com>
>  To: ietf-languages at alvestrand.no, ietf at ietf.org
>  
> > From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> > bounces at alvestrand.no] On Behalf Of Bruce Lilly
> 
> 
> > > That is not at all the aim here wrt stability; rather, the aim is
> that a
> > > symbolic identifier used for metadata in IT systems not change
> because
> > > some government on a whim says, "We would now prefer to use 'yz'
> rather
> > > than 'xy' to designate our country."
> > 
> > If by international agreement, 'yz' becomes the designation
> > for that country, then it is rather silly to stick one's
> > fingers in one's ears and shout "NA-NA-NA-NA-NA I don't want
> > to hear you".
> 
> That misses the point entirely. The point is that IDs used by political
> administrations may change for any number of reasons, and those
> admministrations may have no qualms with such changes;

For such changes to become enshrined in an ISO standard
requires a bit more than a mere whim on the part of one
party; in the case of the particular ISO standards under
discussion, it requires convincing the duly appointed
maintenance authority to make the change.

> but in IT 
> systems, we cannot afford changes that break existing implementations
> and data.

Any implementations that depend on country/language codes
never changing are by definition broken implementations,
since there was never any guarantee that codes would never
change.  Change happens, and IT knows how to cope; it's a
versioning problem, and that's not a particularly difficult
problem.  Now I fully agree that in hindsight the ISO and
its appointed MAs could have provided a better record of
changes.

> If for whatever reason ISO and the UN decided that "US" should 
> be used to designate the country of France, I doubt you'd expect every
> software vendor to update all of their deployed installations to use
> "fr-US" instead of "fr-FR", and for every user to go through every data
> repository they manage to make such changes in their data.

The only way that would be likely to happen would be if
there were no longer a "US" *and* if the ISO and UN
representatives of France were to initiate a request for
such a change.  One would presume that they would have
good reason to do so, and could explain said reasons in
order to convince their ISO and UN counterparts to agree
to the change.  Under those hypothetical circumstances, I
can only assume that software vendors who care about such
matters would either agree with the hypothetical reasons
or would have acted to convince those in favor of the change
of reasons to avoid the change.  And while I would not
expect users to retroactively change documents any more than
I would expect coins and paper money to be reissued with old
dates but new designations of country name, I would expect
that as of the agreed-upon effective date of the change that
new documents would be prepared in accordance with the new
standard.  It's difficult to be more precise about such a
wild hypothetical, but consider similar changes made to
time zones...

> The people that maintain time zone definitions may have their means for
> changing times; that's fine for them. They are not dealing with the same
> concerns as we are dealing with.

Sure they are; it's another instance of the same sort of
versioning problem, with the same root causes, viz. items
which are changed (more frequently than some would like)
by politicians.

> The group here that has focused 
> specifically on language-tagging issues for several years has evaluated
> issues that affect language tags and the impact of changes and has
> decided what is best practice for *this* domain, and it is to maintain
> stability of data rather than cater to whims of political
> administrations.

Now that the horses have all run away, you'd better make
sure the stable doors are locked. :-)  There was never
any guarantee of stability of country codes or of language
codes.  Declaring at some time in the future that today's
meaning of sr-CS never meant what it in fact does mean
doesn't create stability; it creates instability -- it
doesn't make the versioning problem go away; it adds yet
a third version to the existing two.

> > "Designed" or not, country codes *are* read by humans; they
> > appear in top-level domain names.  Currently the ISO 639
> > 2-letter codes mean the same thing as the last component of
> > a domain name
> 
> I think you mean ISO 3166 2-letter codes.

Yes, my error.

> > and as the second component of a language-tag.
> > It's rather silly to change that correspondence simply because
> > a few people are piqued that international agreement has been
> > reached to change a few 2-letter codes.
> 
> The usability flaw in treating ISO 639 and ISO 3166 as human-readable is
> evident in the confusion between ja and JP (or is it jp and JA?), and GB
> vs UK.

Without looking I can easily tell that jp and uk are country
codes precisely *because* they are well-known as TLDs.

> As for what is silly, if the UN country ID for Canada changed to 
> CN (and that for PRC changed to something else), I'm sure it would cause
> far greater problems for users to have to change the last two letters in
> domain names than for them to keep doing what they always did.

And it is precisely because of such problems that it is
as unlikely to happen as your hypothetical FR->US change.

> In fact, 
> I would have thought it would create a rather significant problem on the
> Internet if such a change were made. (URIs don't come with versioning
> dates for domain names, so how would a DNS server know what the "cn"
> meant?)

URIs aren't guaranteed to be persistent; the Foo company buys
the Bar company and after a while bar.com URIs stop working
because they've been changed to foo.com URIs.  That sort of
thing happens all the time, and people adapt.

> > > Neither RFC 1766 or RFC 3066 has ever presented "official"
> translations;
> > 
> > Both defer to the ISO lists for definitions (not "translations")
> > of the various codes.
> 
> Definitions; not language names for display use.

Feh. Whatever. The human-readable stuff that corresponds
to the code which you say shouldn't be read.   The stuff
without which codes are meaningless.  The stuff without
which two communicating parties cannot agree on the meaning
of "XX".

> > > this is no different for RFC 3066bis.
> > 
> > It is very different; under the proposed draft, there is only
> > an English definition, somebody wishing to provide a French
> > definition finds that he has none and must resort to an
> > unofficial translation.
> 
> The more you press this, the more silly it seems. RFC 3066 does not
> anywhere discuss display names

If I have used the term "display" in conjunction with the
language/country names it is only incidental.  Effective
communication, including language tagging, is an end-to-end
process (RFC 1958).  Without agreement on what a code *means*
there is no effective communication.  And the ISO lists
provide the definitions that attach meaning to otherwise
meaningless combinations of letters.  The stuff that the
humans at each end of the communications channel use to
identify the language.

> The source ISO standards are every bit as accessible as they ever
> were, and just as RFC 3066 gave the user no option but to refer to the
> source ISO standard, so users should and can continue to do so.

So, you're saying that the ISO definition of "CS" as
"Serbia and Montenegro" will continue to be valid, with
that meaning, in a language-tag?

> After this response, I will not waste my time any further on this
> foolishness.

The foolishness is your insistence on trying to tie
the definitions to a localization issue.  While definitions
in some language can be localized, the definitions can
also be used directly, and indeed must be used in some form
into to make sense of the shorthand codes that represent
the definitions.  The issue isn't about localization at all;
it's about having continued accessibility to the definitions
in at least as many languages as is currently the case.

> > I'm willing to postpone the discussion
> > (other problems with the proposed registry format dictate
> > a broader solution which could easily have provision for
> > an arbitrary number of descriptions).
> 
> I strongly object to the suggestion that progress on this draft be
> delayed to deal with this non issue that caters to implementation issues
> that are well beyond the scope of either RFC 3066 or its proposed
> replacement.

Your characterization of the issue is inaccurate.  No matter;
let's deal instead with the issue of retaining the current
meaning of sr-CS (Serbian as used in Serbia and Montenegro).
How does the draft proposal retain the meaning of that valid
RFC 3066/ISO 639/ISO 3166 language tag?

> > No, you are overlooking the fact that a set of codes with
> > no corresponding definitions is useless.  RFC 3066 defers
> > the code/definition pairs to ISO, which provides multilingual
> > definitions. The proposed draft would remove that multilingual
> > characteristic.
> 
> What if the registry provide no name, just the ID? Then people would
> have to refer to the source ISO standard as they did in the past, and we
> would be able specify which ISO IDs were or were not valid.

You'd still have a versioning problem because of the multiple
meanings of "CS". Trying to sweep any of those multiple
meanings under the rug (by declaring them "not valid") isn't
going to make them go away; what about existing data using the
current meanings? I doubt you'd expect every software vendor...
etc. The only way to solve a versioning problem is with an
appropriate versioning solution.

> > > Display names for languages and countries are not within the scope
> of
> > > RFC 1766 or RFC 3066. It is preposterous to suggest that this draft
> is
> > > not compatible with existing implementations of RFC 3066 on that
> basis.
> > 
> > On the contrary, it is preposterous to suggest that codes
> > will be attached to text by magic; some human somewhere,
> > somehow is going to have to indicate the language to
> > something, and it certainly isn't going to be by way of
> > a 2- or 3-letter code without some reference to what those
> > codes *mean*.  And at the present time, the meaning of
> > those codes is defined -- bilingually -- in the ISO
> > lists.
> 
> RFC 3066 did not even discuss let alone provide a means for attaching
> display text to codes. It *is* preposterous to suggest that this draft
> is incompatible with RFC 3066 on that basis. Again, the more you press
> this, the more silly it seems.

I haven't specifically discussed "display names"; that is your
assertion, and not my basis for objection.  I refer to the
definitions and the need to map to and from those definitions
at either end of the communications channel.  Whether or not
that happens by "display" is incidental to the issue of the
number of languages that the definitions are provided in.

> > No, I am complaining about removal of internationalized
> > definitions associated with language tag components.
> 
> No definitions are removed. The draft points to the source ISO standards
> just as RFC 3066 does.

So, again I ask for confirmation; the ISO definition of "CS"
as "Serbie et Monténégro" will not be removed from use with
that meaning in language tags by the draft proposal?

> > "Localization" would be translation of the French definition
> > into some other language.  That is not my concern. My concern
> > is the elimination of the French definition in the first place.
> 
> No, you have not commented on definitions; you have repeatedly commented
> on stings to present to users.

No, presentation is incidental.  I have repeatedly referred to
the definitions, and the draft proposal's effect of removing
50% of those definitions.

> Please accept that your arguments on this 
> matter are empty.

Please accept that your empty caricature is not the same as my
argument.

> > As mentioned, under RFC 1766/3066 review/registration rules,
> > excessively long tags would certainly raise objections. That's
> > no coincidence -- it's an intentional design feature.
> 
> But excessive is not defined anywhere in RFC 1766/3066, and if there was
> a very good reason presented why a tag of x characters long were needed,
> it would have to be considered.

Considered, yes, of course.  But not rubber-stamp approved.
And if not approved, not usable.  And unlikely to be approved
if incompatible with core Internet protocols.