draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

Thu Jan 6 15:44:38 CET 2005

> > Rather, the rule is simply that a country code, if present,
> > always appears as a two letter second subtag. The new draft changes this
> rule,
> > so applications that pay attention to coutnry codes in language tags have
> to
> > change and the new algorithm for finding the country code is trickier.

> Your text above says (a) "if there is a country code in the tag, it is the
> second subtag". That is not what text of RFC 3066 actually says, which is:

> > The following rules apply to the second subtag:
> > All 2-letter subtags are interpreted as ISO 3166 alpha-2 country...

> That is, it says (b) "if a second subtag has 2 letters, then it is an ISO
> 3166 code", which is not the same as (a). (It is almost, but not quite, the
> converse.)

Fine, whatever.

> The current RFC certainly does not forbid the use of country
> codes in other positions in language tags. One could absolutely register
> en-Latin-US, for example, meaning English as spoken in the US written in
> Latin script.

Sure, but my point was, is, and always has been that any 3066-compliant
implementation won't see this as a country code (unless it is table driven,
which brings up its own set of issues).

> There has been a lot of noise on this issue, and too few concrete examples.

No, what there has been is a lot of discussion of a real problem with no
apparent recognition of it as such by the draft authors. Your pejorative
characterization of this as "noise" does not make it so.

> In the so-called 3066bis draft, we have striven very hard to ensure that:

> (c) Every single tag that could be generated under RFC 3066bis is a tag that
> could have been registered under RFC 3066.

True but irrelevant.

> Thus if someone wrote a parser that is future-compatible -- that could parse
> all RFC 3066 language tags including those registered after the parser was
> deployed -- then that parser can handle all 3066bis language tags. This is a
> significant advance over RFC 3066, whose registered (not generated) language
> tags are atomic, and cannot be effectively parsed at all. 3066bis adds more
> structure so as to allow effective parsing of tags.

> If you *can* come up with tags that would show that (c) is invalid, that
> would be a concrete case that we would have to make adjustments in the draft
> for.

(c) is frankly not an issue I care one whit about. (Perhaps I should, but I
don't.) I don't register tags. I write code that processes, and more to the
point matches, tags. That's why I have issues with this draft.

> Moreover, all the talk about this being *too* complex is far overblown.

Again, your pejorative dismissal of other people's concerns does not
mean your position is valid.

> All
> 3066bis language tags can be parsed, including all the grandfathered codes,
> with a very short piece of code, or even with a regular expression (such as
> in Perl).

Of course you can write a short piece of code to parse this stuff. It's what you
do with it after you parse it that's a problem.

> This is not rocket science.

Parsing almost never is. But simply parsing these tag is not, and never has
been, the issue.

				Ned