draft-phillips-langtags-08, process, specifications, "stability",  and extensions

John Cowan jcowan at reutershealth.com
Thu Jan 6 14:05:11 CET 2005


Bruce Lilly scripsit:

> > > Precisely; an RFC 1766/3066 parser, based on the 1766 and
> > > 3066 specifications, can expect four classes of language tags:
> > > 1. ISO 639 language code as the primary subtag, optionally
> > > )B   followed by an ISO 3166 country code as the second tag
> > > 2. i as the primary tag; complete tag registered
> > > 3. x as primary tag; private-use
> > > 4. some other IANA-registered complete tag
> > > 
> > > "sr-CS-Latn" fits category 1. "sr-Latn-CS' fits none.
> > 
> > You are mistaken; "sr-Latn-CS" fits your category 4.
> 
> I think not; it is not a registered tag.

Technically correct; however, it is a potentially registerable tag,
and as such an RFC 3066 parser that does not have access to the
IANA registry will accept it (in the language of the new draft,
it is well-formed but not valid).

> There is a possibility
> that it could fit through the "no rules apart from the syntactic
> ones for the third and subsequent tags" given the registration of
> "sr-Latn" (you are correct about that; I missed it).  In that
> respect, the choice of examples is poor; consider "en-US-Latn"
> (category 1) vs. "en-Latn-US" (no category).

In fact, neither of these is currently valid, but both are registerable.
Category 1 tags cannot themselves contain third subtags, though they can
match tags which contain third subtags; this is a fundamental error in
your reading of RFC 3066 which infects the rest of your argument.

> Right. I.e. they should be able to deal with superfluous stuff
> on the right.  But not script tags that suddenly appear between
> language code and country code.

A validating RFC 3066 parser should *not* accept "superfluous stuff
on the right".  You are confusing validation with range matching.

> Again, poor choice of example. Consider "en-Latn-US" vs. "en-US-Latn".
> If one wants (presumably text) in US English in Latin script, the
> latter string is a valid RFC 3066 language tag which matches the
> known semantics of "en-US", even if the RFC 3066 parser has no way
> of interpreting the 3rd (and any subsequent) subtag(s).  

Recte: it is a well-formed but invalid tag which matches etc.

> The former
> is not a *valid* (neither registered in its entirety, nor beginning
> with language code and country code) language-tag, nor could it be
> matched by an RFC 3066 parser to anything greater than plain "en",

Correct.

> and that's presuming that such a parser would even attempt to match
> a known invalid tag to the set of valid tags.  

If it processes en-US-Latn, then it is handling well-formed but invalid
tags, and should process en-Latn-US as well (and match it against "en").

> No, the RFC 3454 considerations for what is valid are based on
> protocol considerations, not on a Quixotic quest for "stability"
> of nations.  

The draft does not attempt to stabilize countries, only the codes
applied to them.  ISO 3166, as has been amply demonstrated, does not
and cannot do so, since it codes for the names of countries, not the
countries themselves.

-- 
They do not preach                              John Cowan
  that their God will rouse them                jcowan at reutershealth.com
    A little before the nuts work loose.        http://www.ccil.org/~cowan
They do not teach                               http://www.reutershealth.com
  that His Pity allows them                         --Rudyard Kipling,
    to drop their job when they damn-well choose.   "The Sons of Martha"


More information about the Ietf-languages mailing list