How many subtags is ideal? [RE: LANGUAGE TAG REGISTRATION FORM: iu-Cans]

John Clews scripts20 at
Sat Feb 5 20:22:13 CET 2005

JFC - I thnk you misunderstood. Please don't take the question I raised to
convey any support for the "style" and "authority" attributes that you
have discussed, explicit or implicit.

Essentially, the simpler the tags that are used, the better - that's the
issue I was raising.

Ockham's razor is generally a good guide.

John Clews


> Dear John,
> John,
> your point is well taken. We had an extensive debate on this at the
> occasion of the "RFC 3066bis" proposed Draft. The lang3tag does not
> obviously support all the flexibility needed to register all the languages
> variations, the first one being necessary to the very Internet operations
> is the support of the DNS IDN Tables, that should be the default for the 3
> elements tags as they are the only generic (IANA existing) registration on
> the matter by:
> - the national community trustee (defined by the RF 1591 rules using the
> ISO 3166) codes
> - for the specified script
> - in the concerned language
> The debate and the building consensus about informed language association
> goes along your lines. The technical proposition for operational
> convergence I made is to use a 5 elements langtag [lang5tag] (obviously
> subject to debate, and we will have several meetings on this - since the
> Feb SG2 meeting is too early for a construed proposition).
> The proposition is to use two additional elements:
> - the style in which the language is to be used. Default would be IDN
> - the author(ity) of reference having registered the language description.
> By default the ccTLD manager of the country code. As already registered in
> the IANA base.
> These lang5tags give probably the whole flexibility necessary to go down
> to
> full vernacular support (including individual/cultural variations). In
> this
> perspective the current requests should be understood as including their
> requester name as authoritative element (the one who compiles the sources
> and documents the dictionary, etc. for web services interinteligibility
> for
> example).
> jfc
> At 10:34 04/02/2005, John Clews wrote:
>>Just as an additional general discussion point, should there be rules or
>>at least guidance about when 3-element tags (language-script-country)
>>should be _used_?
>>Should they be used only exceptionally? In many cases, for most users
>> (and
>>developers?) 2 element tags (language-script) will be sufficient, and the
>>third element will not add much differentiation, but the fact that
>>2-element and 3-element tags may cause some interoperability problems.
>>On the other hand, I even the 3-element tag may not be enough. Indeed, I
>>understand that there are different spelling conentions used in writing
>>Inuktitut in syllabics in Eastern and Western parts of Canada.
>>That could be solved by using 4-element codes, with the 4th element
>> coming
>>from subtags from ISO 3166-2 (or elsewhere).
>>Or in the medium term (and better still?) that same situation may be
>>solved once it becomes decided that it is possible to use codes from the
>>forthcoming ISO 639-3, where the differentiation would almost certainly
>> be
>>acheived by using a different 3-letter language subtag as the first
>> subtag
>>in the cases of:
>>East Canadian Inuktitut written in Syllabics, and
>>West Canadian Inuktitut written in Syllabics.
>>I don't have the relevant Ethnologue or ISO 639-3 information to hand, so
>>the last bit is based on what I think is the case, but I'll check it out
>>John Clews
>>---------------------------- Original Message
>> ----------------------------
>>From:    "Peter Constable" <petercon at>
>>Date:    Fri, February 4, 2005 6:34 am
>>To:      ietf-languages at
>> > From: John Cowan [mailto:jcowan at]
>> > > Both iu and iu-CA are valid tags. A script qualification needs to be
>>made for Cans vs Latn. So, iu-Cans is the Cans qualification for iu,
>> > > iu-Cans-CA is the Cans qualification for iu-CA.
>> >
>> > In the current RFC 3066 regime, I quite understand Michael's
>> > to register longer codes that aren't provably distinguishable from
>>shorter codes.  So unless iu is used with more than one script in more
>>than one country, I don't see the justification for iu-Cans-CA.  In
>> > hoped-for RFC 3066bis regime, things will of course be otherwise.
>>I understand why there may be some reluctance. As a platform vendor,
>>though, we face a significant dilemma: I cannot predict when a user will
>>determine that iu-Cans-CA needs to be distinguished from (say)
>>iu-Cans-US. And I face a maintenance problem if it becomes clear in the
>>future that such a distinction is needed and end up changing the tags
>> that
>>we return to apps -- I have no idea what apps that might break. I also
>>face a significant analysis problem, which is the very one you and Tex
>>were looking at and, I think, found to be somewhat intractable: how do we
>>know when a country ID is useful or not? Add to that the potential to get
>>users offended by geo-political issues: "you consider that
>>community to exist independent of any nation, yet you tie us to country
>>X". For many reasons, we really cannot take on making judgments about
>>precisely what the correct tags for users' needs will be.
>>Keep in mind that users' needs aren't consistently the same: if someone
>>asks for an RFC3066 tag that would apply to (e.g.) the "French (France)"
>>locale, we have no way of knowing whether, in the particular context they
>>want to use the tag, the best tag would be "fr" or "fr-FR".
>>Sometimes it will be one, but other times it will be the other.
>>Now, we could choose never to provide users with tags of any shape or
>> form
>>-- they're on their own. We think that's not the best thing for us to do,
>>however. For one thing, our infrastructure includes things that look
>>similar to RFC 3066 tags but are not, and we don't want to lead people to
>>using the wrong thing in RFC3066 contexts. Also, returning data of this
>>sort is precisely the kind of service developers expect from a platform,
>>and it is better for us to provide something useful even if not the
>>perfect tag for the situation rather than nothing at all. Keep in mind
>>what I mentioned above: what is the most suitable tag may vary and we
>> have
>>no way to predict it. The best we can do, then, is to provide the most
>>fully-qualified tag that might be appropriate for the given locale, and
>> if
>>that locale is region-specific (as opposed to our "neutral" locales,
>> which
>>have no region), then it will always
>>include a region component.
>>Also, I think the reluctance to register a tag like iu-Cans-CA is
>>mistaken on other grounds: we are not obligated to determine that every
>>valid tag denotes something distinct from every other valid tag. That is
>>already impossible, since RFC 3066 defines many things as valid that
>> would
>>not correspond to actual linguistic distinctions. I suspect that there's
>>no distinction between fr-CI and fr-GH, but both are valid tags, and
>>probably in use somewhere. And note that, while I think there's no
>>distinction, someone else may determine, for whatever reason, that they
>>think they need to distinguish something in this way.
>>(Of course,
>>there's also the issue that some people will include country IDs whether
>>useful or not just because they think tags always have a country
>>The important thing for us is not to establish precisely what every
>>distinction is (an endless task involving an ever-changing domain over
>>which different interpretations are possible), but rather to ensure that
>>the intended meaning of any tag is understood by all and for which it is
>>clear, to some minimal level, how to utilize it. That's the whole point
>> in
>>defining a more restrictive syntax in RFC 3066: so that we can take a tag
>>we have never seen before and get some useful information out of it. In a
>>case like "iu-Cans-CA", it is clear what the intended meaning is (even if
>>it wasn't stated explicitly in the registration form), it there are
>>sufficiently-clear ways in which it can be used.
>>Peter Constable
>>Ietf-languages mailing list
>>Ietf-languages at


More information about the Ietf-languages mailing list