How many subtags is ideal? [RE: LANGUAGE TAG REGISTRATION FORM: iu-Cans]

John Clews scripts20 at uk2.net
Tue Feb 8 14:06:22 CET 2005


I din't understand any of JFC's points, therefore I do not plan to respond
any further on this.

Nor do I look for any further elucidation.

John Clews

> At 20:22 05/02/2005, John Clews wrote:
>>JFC - I thnk you misunderstood. Please don't take the question I raised
>> to
>>convey any support for the "style" and "authority" attributes that you
>>have discussed, explicit or implicit.
>
> I certainly do not. I only note that real life is very different from
> Tables life. You probably speak all the time as a top Oxonian linguist
> (en-Latn-UK), with your kids, with your butcher, with us, with your tax
> collector, etc. you never used a French, or an Arabic, or an American
> word.
> The BBC issued a note recently on the defense of English vs. American:
> they
> would be delighted. But I am not sure this is the same in Dushambe? And I
> can tell you for sure this is not in Versailles.
>
>>Essentially, the simpler the tags that are used, the better - that's the
>>issue I was raising.
>
> This is true. The reasoning of a 6 years old child is certainly simpler
> than the reasoning of a 60 years old grown one, is that a good reason
> enough to disregard your propositions?
>
>>Ockham's razor is generally a good guide.
>
> Yes. It includes a very simple rule: it will not cut your throat if it can
> scale (RFC 1958). The lang3tag cannot scale. The lang5tag can. I am
> neither
> racist nor dump stupid: you are welcome to show me that a lang4tag could
> scale and match the Ockham's razor. The Language Mendeleiev Table you are
> at, directly conflicts with the IDN Tables Table (never do the same thing
> differently, RFC 1958 again). It makes a very poor and controversial
> response to the need of the cultural ontology all the lang(4+n)tags
> belongs
> to.
> jfc
>
>
>
>
>
>
>
>
>>John Clews
>>
>>--
>>
>> > Dear John,
>> > John,
>> > your point is well taken. We had an extensive debate on this at the
>> > occasion of the "RFC 3066bis" proposed Draft. The lang3tag does not
>> > obviously support all the flexibility needed to register all the
>> languages
>> > variations, the first one being necessary to the very Internet
>> operations
>> > is the support of the DNS IDN Tables, that should be the default for
>> the 3
>> > elements tags as they are the only generic (IANA existing)
>> registration on
>> > the matter by:
>> > - the national community trustee (defined by the RF 1591 rules using
>> the
>> > ISO 3166) codes
>> > - for the specified script
>> > - in the concerned language
>> >
>> > The debate and the building consensus about informed language
>> association
>> > goes along your lines. The technical proposition for operational
>> > convergence I made is to use a 5 elements langtag [lang5tag]
>> (obviously
>> > subject to debate, and we will have several meetings on this - since
>> the
>> > Feb SG2 meeting is too early for a construed proposition).
>> >
>> > The proposition is to use two additional elements:
>> > - the style in which the language is to be used. Default would be IDN
>> > - the author(ity) of reference having registered the language
>> description.
>> > By default the ccTLD manager of the country code. As already
>> registered in
>> > the IANA base.
>> >
>> > These lang5tags give probably the whole flexibility necessary to go
>> down
>> > to
>> > full vernacular support (including individual/cultural variations). In
>> > this
>> > perspective the current requests should be understood as including
>> their
>> > requester name as authoritative element (the one who compiles the
>> sources
>> > and documents the dictionary, etc. for web services
>> interinteligibility
>> > for
>> > example).
>> >
>> > jfc
>> >
>> >
>> > At 10:34 04/02/2005, John Clews wrote:
>> >>Just as an additional general discussion point, should there be rules
>> or
>> >>at least guidance about when 3-element tags (language-script-country)
>> >>should be _used_?
>> >>
>> >>Should they be used only exceptionally? In many cases, for most users
>> >> (and
>> >>developers?) 2 element tags (language-script) will be sufficient, and
>> the
>> >>third element will not add much differentiation, but the fact that
>> >>2-element and 3-element tags may cause some interoperability problems.
>> >>
>> >>On the other hand, I even the 3-element tag may not be enough. Indeed,
>> I
>> >>understand that there are different spelling conentions used in
>> writing
>> >>Inuktitut in syllabics in Eastern and Western parts of Canada.
>> >>
>> >>That could be solved by using 4-element codes, with the 4th element
>> >> coming
>> >>from subtags from ISO 3166-2 (or elsewhere).
>> >>
>> >>Or in the medium term (and better still?) that same situation may be
>> >>solved once it becomes decided that it is possible to use codes from
>> the
>> >>forthcoming ISO 639-3, where the differentiation would almost
>> certainly
>> >> be
>> >>acheived by using a different 3-letter language subtag as the first
>> >> subtag
>> >>in the cases of:
>> >>East Canadian Inuktitut written in Syllabics, and
>> >>West Canadian Inuktitut written in Syllabics.
>> >>
>> >>I don't have the relevant Ethnologue or ISO 639-3 information to hand,
>> so
>> >>the last bit is based on what I think is the case, but I'll check it
>> out
>> >>later.
>> >>
>> >>John Clews
>> >>
>> >>
>> >>
>> >>---------------------------- Original Message
>> >> ----------------------------
>> >>Subject: RE: LANGUAGE TAG REGISTRATION FORM: iu-Cans
>> >>From:    "Peter Constable" <petercon at microsoft.com>
>> >>Date:    Fri, February 4, 2005 6:34 am
>> >>To:      ietf-languages at iana.org
>> >>--------------------------------------------------------------------------
>> >>
>> >> > From: John Cowan [mailto:jcowan at reutershealth.com]
>> >>
>> >> > > Both iu and iu-CA are valid tags. A script qualification needs to
>> be
>> >>made for Cans vs Latn. So, iu-Cans is the Cans qualification for iu,
>> >>and
>> >> > > iu-Cans-CA is the Cans qualification for iu-CA.
>> >> >
>> >> > In the current RFC 3066 regime, I quite understand Michael's
>> >>reluctance
>> >> > to register longer codes that aren't provably distinguishable from
>> >>shorter codes.  So unless iu is used with more than one script in more
>> >>than one country, I don't see the justification for iu-Cans-CA.  In
>> >>our
>> >> > hoped-for RFC 3066bis regime, things will of course be otherwise.
>> >>
>> >>I understand why there may be some reluctance. As a platform vendor,
>> >>though, we face a significant dilemma: I cannot predict when a user
>> will
>> >>determine that iu-Cans-CA needs to be distinguished from (say)
>> >>iu-Cans-US. And I face a maintenance problem if it becomes clear in
>> the
>> >>future that such a distinction is needed and end up changing the tags
>> >> that
>> >>we return to apps -- I have no idea what apps that might break. I also
>> >>face a significant analysis problem, which is the very one you and Tex
>> >>were looking at and, I think, found to be somewhat intractable: how do
>> we
>> >>know when a country ID is useful or not? Add to that the potential to
>> get
>> >>users offended by geo-political issues: "you consider that
>> >>community to exist independent of any nation, yet you tie us to
>> country
>> >>X". For many reasons, we really cannot take on making judgments about
>> >>precisely what the correct tags for users' needs will be.
>> >>
>> >>Keep in mind that users' needs aren't consistently the same: if
>> someone
>> >>asks for an RFC3066 tag that would apply to (e.g.) the "French
>> (France)"
>> >>locale, we have no way of knowing whether, in the particular context
>> they
>> >>want to use the tag, the best tag would be "fr" or "fr-FR".
>> >>Sometimes it will be one, but other times it will be the other.
>> >>
>> >>Now, we could choose never to provide users with tags of any shape or
>> >> form
>> >>-- they're on their own. We think that's not the best thing for us to
>> do,
>> >>however. For one thing, our infrastructure includes things that look
>> >>similar to RFC 3066 tags but are not, and we don't want to lead people
>> to
>> >>using the wrong thing in RFC3066 contexts. Also, returning data of
>> this
>> >>sort is precisely the kind of service developers expect from a
>> platform,
>> >>and it is better for us to provide something useful even if not the
>> >>perfect tag for the situation rather than nothing at all. Keep in mind
>> >>what I mentioned above: what is the most suitable tag may vary and we
>> >> have
>> >>no way to p
> redict it. The best we can do, then, is to provide the most
>> >>fully-qualified tag that might be appropriate for the given locale,
>> and
>> >> if
>> >>that locale is region-specific (as opposed to our "neutral" locales,
>> >> which
>> >>have no region), then it will always
>> >>include a region component.
>> >>
>> >>Also, I think the reluctance to register a tag like iu-Cans-CA is
>> >>mistaken on other grounds: we are not obligated to determine that
>> every
>> >>valid tag denotes something distinct from every other valid tag. That
>> is
>> >>already impossible, since RFC 3066 defines many things as valid that
>> >> would
>> >>not correspond to actual linguistic distinctions. I suspect that
>> there's
>> >>no distinction between fr-CI and fr-GH, but both are valid tags, and
>> >>probably in use somewhere. And note that, while I think there's no
>> >>distinction, someone else may determine, for whatever reason, that
>> they
>> >>think they need to distinguish something in this way.
>> >>
>> >>(Of course,
>> >>there's also the issue that some people will include country IDs
>> whether
>> >>useful or not just because they think tags always have a country
>> >>element.)
>> >>
>> >>The important thing for us is not to establish precisely what every
>> >>distinction is (an endless task involving an ever-changing domain over
>> >>which different interpretations are possible), but rather to ensure
>> that
>> >>the intended meaning of any tag is understood by all and for which it
>> is
>> >>clear, to some minimal level, how to utilize it. That's the whole
>> point
>> >> in
>> >>defining a more restrictive syntax in RFC 3066: so that we can take a
>> tag
>> >>we have never seen before and get some useful information out of it.
>> In a
>> >>case like "iu-Cans-CA", it is clear what the intended meaning is (even
>> if
>> >>it wasn't stated explicitly in the registration form), it there are
>> >>sufficiently-clear ways in which it can be used.
>> >>
>> >>
>> >>
>> >>Peter Constable
>> >>_______________________________________________
>> >>Ietf-languages mailing list
>> >>Ietf-languages at alvestrand.no
>> >>http://www.alvestrand.no/mailman/listinfo/ietf-languages
>>
>>--
>



More information about the Ietf-languages mailing list