draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

Wed Jan 5 16:33:04 CET 2005

> > > Finding country codes is straightforward: any non-initial subtag of
> > > two letters (not appearing to the right of "x-" or "-x-") is a country
> > > code.  This is true in RFC 1766, RFC 3066, and the current draft.

> > On the contrary, in RFC 3066 the rule is "any 2 letter value that
> > appears as the second subtag is a country code". The rule in the new
> > draft is either the formulation you give above or  "any 2 letter value
> > that appears as a subtag after the initial subtag and some number of
> > 3 and 4 letter subtags".

> I didn't state it as a rule, but as true.  Every non-initial 2-letter
> tag in RFC 3066 is a country code; the same is true in the draft.

Again, that is not what RFC 3066 says. From section 2.2:

 There are no rules apart from the syntactic ones for the third and subsequent
 subtags.

Sure sounds to me like a third two letter subtag is (a) Allowed and (b)
Isn't supposed to be treated as country code.

Now, it may be the case that all _registered_ tags have avoided the use of
non-country code two letter codes in the third and later position. But this is
100% irrelevant. The point is that conformant code implementing RFC 3066 is
broken if it simply assumes any 2 letter code after the first subtag is a
country code. Rather, the rule is simply that a country code, if present,
always appears as a two letter second subtag. The new draft changes this rule,
so applications that pay attention to coutnry codes in language tags have to
change and the new algorithm for finding the country code is trickier.

> (A private correspondent notes that the reference to "-x-" should
> in fact be a reference to any singleton, though "-x-" and "i-" are
> the only singletons currently usable.)

I have to say I find it quite interesting that one of the main proponents of
the new draft, while arguing that the new draft doesn't make the matching
problem a lot harder, ended up giving an erroneous rule for extracting country
codes from a language tag. 

> > Just because something doesn't necessarily do something doesn't mean it
> > never does it.

> It does mean it can't be counted on in the general case.

Sure, in the general case most if not all of these nasty corner cases you've
created can be blithly assumed away because they only appear in specific
problem domains. Actual applications that operate in those specific domains
aren't so lucky, however. And the metric we're supposed to apply in the IETF is
real world implementability.

As it happens I deal with messaging applications, and in this space text/plain
with all sorts of nasty charset issues is the rule, not the exception.

> > Well, maybe I'm missing something obvious, but I see nothing in RFC
> > 3066 that qualifies as a description of a matching algorithm.

> Section 2.5 (2.4.1 in the draft) states the matching rule in a succinct
> fashion.  Everything in 2.4.2 is a non-normative elaboration of this.

??? Which in no way refutes my assertion that no matching rule algorithm
was given in RFC 3066!

				Ned