LANGUAGE TAG REGISTRATION FORM: iu-Cans

Peter Constable petercon at microsoft.com
Fri Feb 4 07:34:11 CET 2005


> From: John Cowan [mailto:jcowan at reutershealth.com]

> > Both iu and iu-CA are valid tags. A script qualification needs to be
> > made for Cans vs Latn. So, iu-Cans is the Cans qualification for iu,
and
> > iu-Cans-CA is the Cans qualification for iu-CA.
> 
> In the current RFC 3066 regime, I quite understand Michael's
reluctance
> to register longer codes that aren't provably distinguishable from
> shorter codes.  So unless iu is used with more than one script in more
> than one country, I don't see the justification for iu-Cans-CA.  In
our
> hoped-for RFC 3066bis regime, things will of course be otherwise.

I understand why there may be some reluctance. As a platform vendor,
though, we face a significant dilemma: I cannot predict when a user will
determine that iu-Cans-CA needs to be distinguished from (say)
iu-Cans-US. And I face a maintenance problem if it becomes clear in the
future that such a distinction is needed and end up changing the tags
that we return to apps -- I have no idea what apps that might break. I
also face a significant analysis problem, which is the very one you and
Tex were looking at and, I think, found to be somewhat intractable: how
do we know when a country ID is useful or not? Add to that the potential
to get users offended by geo-political issues: "you consider that
community to exist independent of any nation, yet you tie us to country
X". For many reasons, we really cannot take on making judgments about
precisely what the correct tags for users' needs will be.

Keep in mind that users' needs aren't consistently the same: if someone
asks for an RFC3066 tag that would apply to (e.g.) the "French (France)"
locale, we have no way of knowing whether, in the particular context
they want to use the tag, the best tag would be "fr" or "fr-FR".
Sometimes it will be one, but other times it will be the other.

Now, we could choose never to provide users with tags of any shape or
form -- they're on their own. We think that's not the best thing for us
to do, however. For one thing, our infrastructure includes things that
look similar to RFC 3066 tags but are not, and we don't want to lead
people to using the wrong thing in RFC3066 contexts. Also, returning
data of this sort is precisely the kind of service developers expect
from a platform, and it is better for us to provide something useful
even if not the perfect tag for the situation rather than nothing at
all. Keep in mind what I mentioned above: what is the most suitable tag
may vary and we have no way to predict it. The best we can do, then, is
to provide the most fully-qualified tag that might be appropriate for
the given locale, and if that locale is region-specific (as opposed to
our "neutral" locales, which have no region), then it will always
include a region component.

Also, I think the reluctance to register a tag like iu-Cans-CA is
mistaken on other grounds: we are not obligated to determine that every
valid tag denotes something distinct from every other valid tag. That is
already impossible, since RFC 3066 defines many things as valid that
would not correspond to actual linguistic distinctions. I suspect that
there's no distinction between fr-CI and fr-GH, but both are valid tags,
and probably in use somewhere. And note that, while I think there's no
distinction, someone else may determine, for whatever reason, that they
think they need to distinguish something in this way. (Of course,
there's also the issue that some people will include country IDs whether
useful or not just because they think tags always have a country
element.) 

The important thing for us is not to establish precisely what every
distinction is (an endless task involving an ever-changing domain over
which different interpretations are possible), but rather to ensure that
the intended meaning of any tag is understood by all and for which it is
clear, to some minimal level, how to utilize it. That's the whole point
in defining a more restrictive syntax in RFC 3066: so that we can take a
tag we have never seen before and get some useful information out of it.
In a case like "iu-Cans-CA", it is clear what the intended meaning is
(even if it wasn't stated explicitly in the registration form), it there
are sufficiently-clear ways in which it can be used.



Peter Constable


More information about the Ietf-languages mailing list