draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

Mark Davis mark.davis at jtcsv.com
Thu Jan 6 02:55:42 CET 2005

> Rather, the rule is simply that a country code, if present,
> always appears as a two letter second subtag. The new draft changes this
> so applications that pay attention to coutnry codes in language tags have
> change and the new algorithm for finding the country code is trickier.

Your text above says (a) "if there is a country code in the tag, it is the
second subtag". That is not what text of RFC 3066 actually says, which is:

> The following rules apply to the second subtag:
> All 2-letter subtags are interpreted as ISO 3166 alpha-2 country...

That is, it says (b) "if a second subtag has 2 letters, then it is an ISO
3166 code", which is not the same as (a). (It is almost, but not quite, the
converse.) The current RFC certainly does not forbid the use of country
codes in other positions in language tags. One could absolutely register
en-Latin-US, for example, meaning English as spoken in the US written in
Latin script.

There has been a lot of noise on this issue, and too few concrete examples.
In the so-called 3066bis draft, we have striven very hard to ensure that:

(c) Every single tag that could be generated under RFC 3066bis is a tag that
could have been registered under RFC 3066.

Thus if someone wrote a parser that is future-compatible -- that could parse
all RFC 3066 language tags including those registered after the parser was
deployed -- then that parser can handle all 3066bis language tags. This is a
significant advance over RFC 3066, whose registered (not generated) language
tags are atomic, and cannot be effectively parsed at all. 3066bis adds more
structure so as to allow effective parsing of tags.

If you *can* come up with tags that would show that (c) is invalid, that
would be a concrete case that we would have to make adjustments in the draft

A second issue that has come up is complexity. Admittedly, 3066bis is more
complex than RFC 3066. Part of that is due to adding additional structure,
and part due to necessary clarifications (such as the distinction between
well-formed and valid). But we did not add the additional structure at a
whim. RFC 3066, while a significant advance, is simply not now powerful
enough to meet the current needs for distinctions in language needed by the
industry. The companies and organizations in the Unicode consortium, for
example, are supporting 3066bis for improved software internationalization.
For more information on the reasons behind the enhancements in 3066bis see

Moreover, all the talk about this being *too* complex is far overblown. All
3066bis language tags can be parsed, including all the grandfathered codes,
with a very short piece of code, or even with a regular expression (such as
in Perl). This is not rocket science.


----- Original Message ----- 
From: <ned.freed at mrochek.com>
To: "John Cowan" <jcowan at reutershealth.com>
Cc: <ned.freed at mrochek.com>; <ietf-languages at alvestrand.no>; <ietf at ietf.org>
Sent: Wednesday, January 05, 2005 07:33
Subject: Re: draft-phillips-langtags-08, process, sp
ecifications,"stability", and extensions

> > > > Finding country codes is straightforward: any non-initial subtag of
> > > > two letters (not appearing to the right of "x-" or "-x-") is a
> > > > code.  This is true in RFC 1766, RFC 3066, and the current draft.
> > > On the contrary, in RFC 3066 the rule is "any 2 letter value that
> > > appears as the second subtag is a country code". The rule in the new
> > > draft is either the formulation you give above or  "any 2 letter value
> > > that appears as a subtag after the initial subtag and some number of
> > > 3 and 4 letter subtags".
> > I didn't state it as a rule, but as true.  Every non-initial 2-letter
> > tag in RFC 3066 is a country code; the same is true in the draft.
> Again, that is not what RFC 3066 says. From section 2.2:
>  There are no rules apart from the syntactic ones for the third and
>  subtags.
> Sure sounds to me like a third two letter subtag is (a) Allowed and (b)
> Isn't supposed to be treated as country code.
> Now, it may be the case that all _registered_ tags have avoided the use of
> non-country code two letter codes in the third and later position. But
this is
> 100% irrelevant. The point is that conformant code implementing RFC 3066
> broken if it simply assumes any 2 letter code after the first subtag is a
> country code. Rather, the rule is simply that a country code, if present,
> always appears as a two letter second subtag. The new draft changes this
> so applications that pay attention to coutnry codes in language tags have
> change and the new algorithm for finding the country code is trickier.
> > (A private correspondent notes that the reference to "-x-" should
> > in fact be a reference to any singleton, though "-x-" and "i-" are
> > the only singletons currently usable.)
> I have to say I find it quite interesting that one of the main proponents
> the new draft, while arguing that the new draft doesn't make the matching
> problem a lot harder, ended up giving an erroneous rule for extracting
> codes from a language tag.
> > > Just because something doesn't necessarily do something doesn't mean
> > > never does it.
> > It does mean it can't be counted on in the general case.
> Sure, in the general case most if not all of these nasty corner cases
> created can be blithly assumed away because they only appear in specific
> problem domains. Actual applications that operate in those specific
> aren't so lucky, however. And the metric we're supposed to apply in the
> real world implementability.
> As it happens I deal with messaging applications, and in this space
> with all sorts of nasty charset issues is the rule, not the exception.
> > > Well, maybe I'm missing something obvious, but I see nothing in RFC
> > > 3066 that qualifies as a description of a matching algorithm.
> > Section 2.5 (2.4.1 in the draft) states the matching rule in a succinct
> > fashion.  Everything in 2.4.2 is a non-normative elaboration of this.
> ??? Which in no way refutes my assertion that no matching rule algorithm
> was given in RFC 3066!
> Ned
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages

More information about the Ietf-languages mailing list