How many subtags is ideal? [RE: LANGUAGE TAG REGISTRATION FORM: iu-Cans]

JFC (Jefsey) Morfin jefsey at
Fri Feb 4 16:04:44 CET 2005

Dear John,
your point is well taken. We had an extensive debate on this at the 
occasion of the "RFC 3066bis" proposed Draft. The lang3tag does not 
obviously support all the flexibility needed to register all the languages 
variations, the first one being necessary to the very Internet operations 
is the support of the DNS IDN Tables, that should be the default for the 3 
elements tags as they are the only generic (IANA existing) registration on 
the matter by:
- the national community trustee (defined by the RF 1591 rules using the 
ISO 3166) codes
- for the specified script
- in the concerned language

The debate and the building consensus about informed language association 
goes along your lines. The technical proposition for operational 
convergence I made is to use a 5 elements langtag [lang5tag] (obviously 
subject to debate, and we will have several meetings on this - since the 
Feb SG2 meeting is too early for a construed proposition).

The proposition is to use two additional elements:
- the style in which the language is to be used. Default would be IDN
- the author(ity) of reference having registered the language description. 
By default the ccTLD manager of the country code. As already registered in 
the IANA base.

These lang5tags give probably the whole flexibility necessary to go down to 
full vernacular support (including individual/cultural variations). In this 
perspective the current requests should be understood as including their 
requester name as authoritative element (the one who compiles the sources 
and documents the dictionary, etc. for web services interinteligibility for 


At 10:34 04/02/2005, John Clews wrote:
>Just as an additional general discussion point, should there be rules or
>at least guidance about when 3-element tags (language-script-country)
>should be _used_?
>Should they be used only exceptionally? In many cases, for most users (and
>developers?) 2 element tags (language-script) will be sufficient, and the
>third element will not add much differentiation, but the fact that
>2-element and 3-element tags may cause some interoperability problems.
>On the other hand, I even the 3-element tag may not be enough. Indeed, I
>understand that there are different spelling conentions used in writing
>Inuktitut in syllabics in Eastern and Western parts of Canada.
>That could be solved by using 4-element codes, with the 4th element coming
>from subtags from ISO 3166-2 (or elsewhere).
>Or in the medium term (and better still?) that same situation may be
>solved once it becomes decided that it is possible to use codes from the
>forthcoming ISO 639-3, where the differentiation would almost certainly be
>acheived by using a different 3-letter language subtag as the first subtag
>in the cases of:
>East Canadian Inuktitut written in Syllabics, and
>West Canadian Inuktitut written in Syllabics.
>I don't have the relevant Ethnologue or ISO 639-3 information to hand, so
>the last bit is based on what I think is the case, but I'll check it out
>John Clews
>---------------------------- Original Message ----------------------------
>From:    "Peter Constable" <petercon at>
>Date:    Fri, February 4, 2005 6:34 am
>To:      ietf-languages at
> > From: John Cowan [mailto:jcowan at]
> > > Both iu and iu-CA are valid tags. A script qualification needs to be
>made for Cans vs Latn. So, iu-Cans is the Cans qualification for iu,
> > > iu-Cans-CA is the Cans qualification for iu-CA.
> >
> > In the current RFC 3066 regime, I quite understand Michael's
> > to register longer codes that aren't provably distinguishable from
>shorter codes.  So unless iu is used with more than one script in more
>than one country, I don't see the justification for iu-Cans-CA.  In
> > hoped-for RFC 3066bis regime, things will of course be otherwise.
>I understand why there may be some reluctance. As a platform vendor,
>though, we face a significant dilemma: I cannot predict when a user will
>determine that iu-Cans-CA needs to be distinguished from (say)
>iu-Cans-US. And I face a maintenance problem if it becomes clear in the
>future that such a distinction is needed and end up changing the tags that
>we return to apps -- I have no idea what apps that might break. I also
>face a significant analysis problem, which is the very one you and Tex
>were looking at and, I think, found to be somewhat intractable: how do we
>know when a country ID is useful or not? Add to that the potential to get
>users offended by geo-political issues: "you consider that
>community to exist independent of any nation, yet you tie us to country
>X". For many reasons, we really cannot take on making judgments about
>precisely what the correct tags for users' needs will be.
>Keep in mind that users' needs aren't consistently the same: if someone
>asks for an RFC3066 tag that would apply to (e.g.) the "French (France)"
>locale, we have no way of knowing whether, in the particular context they
>want to use the tag, the best tag would be "fr" or "fr-FR".
>Sometimes it will be one, but other times it will be the other.
>Now, we could choose never to provide users with tags of any shape or form
>-- they're on their own. We think that's not the best thing for us to do,
>however. For one thing, our infrastructure includes things that look
>similar to RFC 3066 tags but are not, and we don't want to lead people to
>using the wrong thing in RFC3066 contexts. Also, returning data of this
>sort is precisely the kind of service developers expect from a platform,
>and it is better for us to provide something useful even if not the
>perfect tag for the situation rather than nothing at all. Keep in mind
>what I mentioned above: what is the most suitable tag may vary and we have
>no way to predict it. The best we can do, then, is to provide the most
>fully-qualified tag that might be appropriate for the given locale, and if
>that locale is region-specific (as opposed to our "neutral" locales, which
>have no region), then it will always
>include a region component.
>Also, I think the reluctance to register a tag like iu-Cans-CA is
>mistaken on other grounds: we are not obligated to determine that every
>valid tag denotes something distinct from every other valid tag. That is
>already impossible, since RFC 3066 defines many things as valid that would
>not correspond to actual linguistic distinctions. I suspect that there's
>no distinction between fr-CI and fr-GH, but both are valid tags, and
>probably in use somewhere. And note that, while I think there's no
>distinction, someone else may determine, for whatever reason, that they
>think they need to distinguish something in this way.
>(Of course,
>there's also the issue that some people will include country IDs whether
>useful or not just because they think tags always have a country
>The important thing for us is not to establish precisely what every
>distinction is (an endless task involving an ever-changing domain over
>which different interpretations are possible), but rather to ensure that
>the intended meaning of any tag is understood by all and for which it is
>clear, to some minimal level, how to utilize it. That's the whole point in
>defining a more restrictive syntax in RFC 3066: so that we can take a tag
>we have never seen before and get some useful information out of it. In a
>case like "iu-Cans-CA", it is clear what the intended meaning is (even if
>it wasn't stated explicitly in the registration form), it there are
>sufficiently-clear ways in which it can be used.
>Peter Constable
>Ietf-languages mailing list
>Ietf-languages at
>Ietf-languages mailing list
>Ietf-languages at

More information about the Ietf-languages mailing list