How many subtags is ideal? [RE: LANGUAGE TAG REGISTRATION FORM: iu-Cans]

John Clews scripts20 at
Fri Feb 4 10:34:54 CET 2005

Just as an additional general discussion point, should there be rules or
at least guidance about when 3-element tags (language-script-country)
should be _used_?

Should they be used only exceptionally? In many cases, for most users (and
developers?) 2 element tags (language-script) will be sufficient, and the
third element will not add much differentiation, but the fact that
2-element and 3-element tags may cause some interoperability problems.

On the other hand, I even the 3-element tag may not be enough. Indeed, I
understand that there are different spelling conentions used in writing
Inuktitut in syllabics in Eastern and Western parts of Canada.

That could be solved by using 4-element codes, with the 4th element coming
from subtags from ISO 3166-2 (or elsewhere).

Or in the medium term (and better still?) that same situation may be
solved once it becomes decided that it is possible to use codes from the
forthcoming ISO 639-3, where the differentiation would almost certainly be
acheived by using a different 3-letter language subtag as the first subtag
in the cases of:
East Canadian Inuktitut written in Syllabics, and
West Canadian Inuktitut written in Syllabics.

I don't have the relevant Ethnologue or ISO 639-3 information to hand, so
the last bit is based on what I think is the case, but I'll check it out

John Clews

---------------------------- Original Message ----------------------------
From:    "Peter Constable" <petercon at>
Date:    Fri, February 4, 2005 6:34 am
To:      ietf-languages at

> From: John Cowan [mailto:jcowan at]

> > Both iu and iu-CA are valid tags. A script qualification needs to be
made for Cans vs Latn. So, iu-Cans is the Cans qualification for iu,
> > iu-Cans-CA is the Cans qualification for iu-CA.
> In the current RFC 3066 regime, I quite understand Michael's
> to register longer codes that aren't provably distinguishable from
shorter codes.  So unless iu is used with more than one script in more
than one country, I don't see the justification for iu-Cans-CA.  In
> hoped-for RFC 3066bis regime, things will of course be otherwise.

I understand why there may be some reluctance. As a platform vendor,
though, we face a significant dilemma: I cannot predict when a user will
determine that iu-Cans-CA needs to be distinguished from (say)
iu-Cans-US. And I face a maintenance problem if it becomes clear in the
future that such a distinction is needed and end up changing the tags that
we return to apps -- I have no idea what apps that might break. I also
face a significant analysis problem, which is the very one you and Tex
were looking at and, I think, found to be somewhat intractable: how do we
know when a country ID is useful or not? Add to that the potential to get
users offended by geo-political issues: "you consider that
community to exist independent of any nation, yet you tie us to country
X". For many reasons, we really cannot take on making judgments about
precisely what the correct tags for users' needs will be.

Keep in mind that users' needs aren't consistently the same: if someone
asks for an RFC3066 tag that would apply to (e.g.) the "French (France)"
locale, we have no way of knowing whether, in the particular context they
want to use the tag, the best tag would be "fr" or "fr-FR".
Sometimes it will be one, but other times it will be the other.

Now, we could choose never to provide users with tags of any shape or form
-- they're on their own. We think that's not the best thing for us to do,
however. For one thing, our infrastructure includes things that look
similar to RFC 3066 tags but are not, and we don't want to lead people to
using the wrong thing in RFC3066 contexts. Also, returning data of this
sort is precisely the kind of service developers expect from a platform,
and it is better for us to provide something useful even if not the
perfect tag for the situation rather than nothing at all. Keep in mind
what I mentioned above: what is the most suitable tag may vary and we have
no way to predict it. The best we can do, then, is to provide the most
fully-qualified tag that might be appropriate for the given locale, and if
that locale is region-specific (as opposed to our "neutral" locales, which
have no region), then it will always
include a region component.

Also, I think the reluctance to register a tag like iu-Cans-CA is
mistaken on other grounds: we are not obligated to determine that every
valid tag denotes something distinct from every other valid tag. That is
already impossible, since RFC 3066 defines many things as valid that would
not correspond to actual linguistic distinctions. I suspect that there's
no distinction between fr-CI and fr-GH, but both are valid tags, and
probably in use somewhere. And note that, while I think there's no
distinction, someone else may determine, for whatever reason, that they
think they need to distinguish something in this way.

(Of course,
there's also the issue that some people will include country IDs whether
useful or not just because they think tags always have a country

The important thing for us is not to establish precisely what every
distinction is (an endless task involving an ever-changing domain over
which different interpretations are possible), but rather to ensure that
the intended meaning of any tag is understood by all and for which it is
clear, to some minimal level, how to utilize it. That's the whole point in
defining a more restrictive syntax in RFC 3066: so that we can take a tag
we have never seen before and get some useful information out of it. In a
case like "iu-Cans-CA", it is clear what the intended meaning is (even if
it wasn't stated explicitly in the registration form), it there are
sufficiently-clear ways in which it can be used.

Peter Constable
Ietf-languages mailing list
Ietf-languages at


More information about the Ietf-languages mailing list