What's the plan for ISO 639-3 and RFC 3066 ter?
Addison Phillips [wM]
aphillips at webmethods.com
Tue Aug 17 06:46:06 CEST 2004
Thanks for the notes.
> These will be practical only if ISO 639-5 contains a normative
> mapping from
> its language collection codes to the codes of languages in the collection.
> (We will already have this for the macro-languages in 639-3.)
It seems odd to have a bunch of language collection codes that don't bother to collect up the languages... but I haven't enough knowledge of their work to really say if that would be an odd result. I guess I'm assuming that all macro languages and language collections will be defined as sets and not just vague labels.
> > These would make the following cases on a 'ter' processor:
> > a. if I request 'lmn', I get content labelled 'oc-lmn' and
> 'lmn-FR', but
> > not those labelled 'oc-gsc' or even 'oc'. ('lmn' becomes
> 'oc-lmn', 'lmn-FR'
> > becomes 'oc-lmn-FR')
> > b. if I request 'oc', I get content labelled 'oc', 'gsc', 'lmn', and
> > 'oc-lmn', etc. (gsc and lmn map to oc-gsc and oc-lmn, for example)
> Case (a) doesn't seem right; if lmn is a synonym for oc-lmn, it should
> be able to fall back to oc.
No, lmn is a synonym for oc-lmn and since, according to matching from 3066 onwards, you cannot trim parts of the requested range, (range) 'lmn' doesn't match (tag) 'oc'. The thing that does the falling back is the tag on the content being selected. This prevents 'oc' from being returned, but it also prevents 'gsc' from being returned. This may not be 100% clear in draft-langtags, but that is what it says. It is crystal clear in RFC 3066 and Mark and I didn't change it, other than to allow you to ignore (remove) private use and extension subtags from the requested range (they don't matter in the tag) before matching.
Martin Dürst has pointed out that this is the reverse of locale fallback systems in that with language tags you want to specify the *least* specific tag you'll accept, generally returning more content the less specific you are. Locale-fallback systems want the *most* specific tag in the request and return the same amount of the nearest match in the content. Familiarity with locale fallback I think confuses people approaching language selection. If you request Limousin, you'll get ONLY Limousin.
The extlang subtags expose this in a surprising way and no, it isn't intuitive. This must be the twelfth time I've explained it.
> > The latter of these would require small changes to
> draft-langtags to allow a
> > range in an alias.
> If oc-lmn and lmn are synonyms, then oc-lmn-whatever and lmn-whatever
> should be too.
The aliasing mechanism works on the subtag level, so it would be exactly so.
> > The question is whether these are the
> > right choices and whether to alter the current draft slightly
> to deal with
> > this (or some other) solution.
> Indeed. My main concern is that until 639-5 comes out, we won't know
> which of the 639-3 codes are going to be subsumed by one of them
> (although we can cobble together a fair approximation based on the
> Ethnologue data).
Right. So the problem is to engineer the most reasonable mechanism for getting to the right results, regardless of what tomfoolery ISO 639-x gets up to. Ideally we pick a mechanism that is most likely to work with the various ISO 639 parts, hoping against hope not to have to change it later. I suspect that 'range as alias' is the most likely mechanism. It is certainly more flexible (a superset) of the other solutions that leap to mind. But it also doesn't solve all problems (witness the matching example).
Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
Internationalization is an architecture.
It is not a feature.
More information about the Ietf-languages