draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

Wed Jan 5 06:58:41 CET 2005

> ned.freed at mrochek.com scripsit:

> > I know of two other wrinkles in the RFC 1766 world:

> Are you aware that RFC 1766 has been obsolete for four years now?

Of course I am.

> > (2) SGN- requires special handling, in that SGN-FR and SGN-EN are in fact
> >    sufficiently different languages that a primary tag match should not be
> >    taken to be a generic match.

> The same is true of the various registered zh-* tags.

Yes, forgot to mention that one. It is actually different and more important in
that the use-cases aren't the same as those for sign languages.

> > (a) Extension tags appear as the first subtags, and as such have to
> >    be taken into account when looking for country subtags.

> Finding country codes is straightforward: any non-initial subtag of two letters
> (not appearing to the right of "x-" or "-x-") is a country code.
> This is true in RFC 1766, RFC 3066, and the current draft.

On the contrary, in RFC 3066 the rule is "any 2 letter value that appears as
the second subtag is a country code". The rule in the new draft is either the
formulation you give above or  "any 2 letter value that appears as a subtag
after the initial subtag and some number of 3 and 4 letter subtags".

These aren't the same.

> > (b) Script tags change the complexion of the matching problem significantly,
> >    in that they can interact with external factors like charset information
> >    in odd ways.

> Can you clarify this?  Charset information neither specifies nor necessarily
> restricts (except in text/plain) the script used to write a document.

And what if you're dealing with text/plain, as many applicationss do?

Just because something doesn't necessarily do something doesn't mean it
never does it.

> > (c) UN country numbers have been added (IMO for no good reason), requiring
> >    handling similar to country codes.

> They provide for supranational language varieties and for stability in
> country codes which is inappropriate for ISO 3166 alphabetic codes (which
> are codes for country *names*).

I'm aware of what they provide (although I see no explanation of this
in the draft). I'm just not convinced that their addition is warranted.

> > The bottom line is that while I know how to write reasonable code to do RFC
> > 1766 matching (and have in fact done so for widely deployed software), I
> > haven't a clue how to handle this new draft competently in regards to
> > matching.

> The draft describes only the RFC 1766 (3066) algorithm, without excluding
> other algorithms to be defined later.

Well, maybe I'm missing something obvious, but I see nothing in RFC 3066 that
qualifies as a description of a matching algorithm. The new draft does include
such a description in section 2.4.2 - an improvement - but leaves any number of
details open. And we all know where the devil lives.

Side note: I don't think item 4 really belongs in the list in section 2.4.2.
It is a warning to implementors, not part of the matching mechanism.

				Ned