draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

Wed Jan 5 23:26:16 CET 2005

ned.freed at mrochek.com scripsit:

> Now, it may be the case that all _registered_ tags have avoided the use of
> non-country code two letter codes in the third and later position. But this is
> 100% irrelevant.

If you say so.

> The point is that conformant code implementing RFC 3066 is
> broken if it simply assumes any 2 letter code after the first subtag is a
> country code. Rather, the rule is simply that a country code, if present,
> always appears as a two letter second subtag.

Not quite.  The rule is that a 2-letter second subtag is a country code.
Country codes could have appeared elsewhere, and may still wind up doing so
before RFC 3066 is obsoleted.

> The new draft changes this rule,
> so applications that pay attention to coutnry codes in language tags have to
> change and the new algorithm for finding the country code is trickier.

But not much.  As an advantage, country codes can always be found in the new
draft, whereas in RFC 3066 they could in principle be anywhere.

> > (A private correspondent notes that the reference to "-x-" should
> > in fact be a reference to any singleton, though "-x-" and "i-" are
> > the only singletons currently usable.)
> 
> I have to say I find it quite interesting that one of the main proponents of
> the new draft, while arguing that the new draft doesn't make the matching
> problem a lot harder, ended up giving an erroneous rule for extracting country
> codes from a language tag. 

Like other people, I sometimes post when tired; I don't think this particularly
interesting.

> Sure, in the general case most if not all of these nasty corner cases you've
> created can be blithly assumed away because they only appear in specific
> problem domains. Actual applications that operate in those specific domains
> aren't so lucky, however. And the metric we're supposed to apply in the IETF is
> real world implementability.

I fail to see what this has to do with the merit of marking scripts in language
tags.  The preferred IETF charset, UTF-8, contains no information about script
whatever.

> As it happens I deal with messaging applications, and in this space text/plain
> with all sorts of nasty charset issues is the rule, not the exception.

Extended language tags will neither help nor harm you, then.

-- 
"We are lost, lost.  No name, no business, no Precious, nothing.  Only empty.
Only hungry: yes, we are hungry.  A few little fishes, nassty bony little
fishes, for a poor creature, and they say death.  So wise they are; so just,
so very just."  --Gollum        jcowan at reutershealth.com  www.ccil.org/~cowan