How many subtags is ideal? [RE: LANGUAGE TAG REGISTRATION
JFC (Jefsey) Morfin
jefsey at jefsey.com
Sat Feb 5 20:59:29 CET 2005
At 20:22 05/02/2005, John Clews wrote:
>JFC - I thnk you misunderstood. Please don't take the question I raised to
>convey any support for the "style" and "authority" attributes that you
>have discussed, explicit or implicit.
I certainly do not. I only note that real life is very different from
Tables life. You probably speak all the time as a top Oxonian linguist
(en-Latn-UK), with your kids, with your butcher, with us, with your tax
collector, etc. you never used a French, or an Arabic, or an American word.
The BBC issued a note recently on the defense of English vs. American: they
would be delighted. But I am not sure this is the same in Dushambe? And I
can tell you for sure this is not in Versailles.
>Essentially, the simpler the tags that are used, the better - that's the
>issue I was raising.
This is true. The reasoning of a 6 years old child is certainly simpler
than the reasoning of a 60 years old grown one, is that a good reason
enough to disregard your propositions?
>Ockham's razor is generally a good guide.
Yes. It includes a very simple rule: it will not cut your throat if it can
scale (RFC 1958). The lang3tag cannot scale. The lang5tag can. I am neither
racist nor dump stupid: you are welcome to show me that a lang4tag could
scale and match the Ockham's razor. The Language Mendeleiev Table you are
at, directly conflicts with the IDN Tables Table (never do the same thing
differently, RFC 1958 again). It makes a very poor and controversial
response to the need of the cultural ontology all the lang(4+n)tags belongs
> > Dear John,
> > John,
> > your point is well taken. We had an extensive debate on this at the
> > occasion of the "RFC 3066bis" proposed Draft. The lang3tag does not
> > obviously support all the flexibility needed to register all the languages
> > variations, the first one being necessary to the very Internet operations
> > is the support of the DNS IDN Tables, that should be the default for the 3
> > elements tags as they are the only generic (IANA existing) registration on
> > the matter by:
> > - the national community trustee (defined by the RF 1591 rules using the
> > ISO 3166) codes
> > - for the specified script
> > - in the concerned language
> > The debate and the building consensus about informed language association
> > goes along your lines. The technical proposition for operational
> > convergence I made is to use a 5 elements langtag [lang5tag] (obviously
> > subject to debate, and we will have several meetings on this - since the
> > Feb SG2 meeting is too early for a construed proposition).
> > The proposition is to use two additional elements:
> > - the style in which the language is to be used. Default would be IDN
> > - the author(ity) of reference having registered the language description.
> > By default the ccTLD manager of the country code. As already registered in
> > the IANA base.
> > These lang5tags give probably the whole flexibility necessary to go down
> > to
> > full vernacular support (including individual/cultural variations). In
> > this
> > perspective the current requests should be understood as including their
> > requester name as authoritative element (the one who compiles the sources
> > and documents the dictionary, etc. for web services interinteligibility
> > for
> > example).
> > jfc
> > At 10:34 04/02/2005, John Clews wrote:
> >>Just as an additional general discussion point, should there be rules or
> >>at least guidance about when 3-element tags (language-script-country)
> >>should be _used_?
> >>Should they be used only exceptionally? In many cases, for most users
> >> (and
> >>developers?) 2 element tags (language-script) will be sufficient, and the
> >>third element will not add much differentiation, but the fact that
> >>2-element and 3-element tags may cause some interoperability problems.
> >>On the other hand, I even the 3-element tag may not be enough. Indeed, I
> >>understand that there are different spelling conentions used in writing
> >>Inuktitut in syllabics in Eastern and Western parts of Canada.
> >>That could be solved by using 4-element codes, with the 4th element
> >> coming
> >>from subtags from ISO 3166-2 (or elsewhere).
> >>Or in the medium term (and better still?) that same situation may be
> >>solved once it becomes decided that it is possible to use codes from the
> >>forthcoming ISO 639-3, where the differentiation would almost certainly
> >> be
> >>acheived by using a different 3-letter language subtag as the first
> >> subtag
> >>in the cases of:
> >>East Canadian Inuktitut written in Syllabics, and
> >>West Canadian Inuktitut written in Syllabics.
> >>I don't have the relevant Ethnologue or ISO 639-3 information to hand, so
> >>the last bit is based on what I think is the case, but I'll check it out
> >>John Clews
> >>---------------------------- Original Message
> >> ----------------------------
> >>Subject: RE: LANGUAGE TAG REGISTRATION FORM: iu-Cans
> >>From: "Peter Constable" <petercon at microsoft.com>
> >>Date: Fri, February 4, 2005 6:34 am
> >>To: ietf-languages at iana.org
> >> > From: John Cowan [mailto:jcowan at reutershealth.com]
> >> > > Both iu and iu-CA are valid tags. A script qualification needs to be
> >>made for Cans vs Latn. So, iu-Cans is the Cans qualification for iu,
> >> > > iu-Cans-CA is the Cans qualification for iu-CA.
> >> >
> >> > In the current RFC 3066 regime, I quite understand Michael's
> >> > to register longer codes that aren't provably distinguishable from
> >>shorter codes. So unless iu is used with more than one script in more
> >>than one country, I don't see the justification for iu-Cans-CA. In
> >> > hoped-for RFC 3066bis regime, things will of course be otherwise.
> >>I understand why there may be some reluctance. As a platform vendor,
> >>though, we face a significant dilemma: I cannot predict when a user will
> >>determine that iu-Cans-CA needs to be distinguished from (say)
> >>iu-Cans-US. And I face a maintenance problem if it becomes clear in the
> >>future that such a distinction is needed and end up changing the tags
> >> that
> >>we return to apps -- I have no idea what apps that might break. I also
> >>face a significant analysis problem, which is the very one you and Tex
> >>were looking at and, I think, found to be somewhat intractable: how do we
> >>know when a country ID is useful or not? Add to that the potential to get
> >>users offended by geo-political issues: "you consider that
> >>community to exist independent of any nation, yet you tie us to country
> >>X". For many reasons, we really cannot take on making judgments about
> >>precisely what the correct tags for users' needs will be.
> >>Keep in mind that users' needs aren't consistently the same: if someone
> >>asks for an RFC3066 tag that would apply to (e.g.) the "French (France)"
> >>locale, we have no way of knowing whether, in the particular context they
> >>want to use the tag, the best tag would be "fr" or "fr-FR".
> >>Sometimes it will be one, but other times it will be the other.
> >>Now, we could choose never to provide users with tags of any shape or
> >> form
> >>-- they're on their own. We think that's not the best thing for us to do,
> >>however. For one thing, our infrastructure includes things that look
> >>similar to RFC 3066 tags but are not, and we don't want to lead people to
> >>using the wrong thing in RFC3066 contexts. Also, returning data of this
> >>sort is precisely the kind of service developers expect from a platform,
> >>and it is better for us to provide something useful even if not the
> >>perfect tag for the situation rather than nothing at all. Keep in mind
> >>what I mentioned above: what is the most suitable tag may vary and we
> >> have
> >>no way to predict it. The best we can do, then, is to provide the most
> >>fully-qualified tag that might be appropriate for the given locale, and
> >> if
> >>that locale is region-specific (as opposed to our "neutral" locales,
> >> which
> >>have no region), then it will always
> >>include a region component.
> >>Also, I think the reluctance to register a tag like iu-Cans-CA is
> >>mistaken on other grounds: we are not obligated to determine that every
> >>valid tag denotes something distinct from every other valid tag. That is
> >>already impossible, since RFC 3066 defines many things as valid that
> >> would
> >>not correspond to actual linguistic distinctions. I suspect that there's
> >>no distinction between fr-CI and fr-GH, but both are valid tags, and
> >>probably in use somewhere. And note that, while I think there's no
> >>distinction, someone else may determine, for whatever reason, that they
> >>think they need to distinguish something in this way.
> >>(Of course,
> >>there's also the issue that some people will include country IDs whether
> >>useful or not just because they think tags always have a country
> >>The important thing for us is not to establish precisely what every
> >>distinction is (an endless task involving an ever-changing domain over
> >>which different interpretations are possible), but rather to ensure that
> >>the intended meaning of any tag is understood by all and for which it is
> >>clear, to some minimal level, how to utilize it. That's the whole point
> >> in
> >>defining a more restrictive syntax in RFC 3066: so that we can take a tag
> >>we have never seen before and get some useful information out of it. In a
> >>case like "iu-Cans-CA", it is clear what the intended meaning is (even if
> >>it wasn't stated explicitly in the registration form), it there are
> >>sufficiently-clear ways in which it can be used.
> >>Peter Constable
> >>Ietf-languages mailing list
> >>Ietf-languages at alvestrand.no
More information about the Ietf-languages