Review of draft-phillips-langtags-03

Thu Jun 24 10:00:52 CEST 2004

Summary:

This document has significant issues, identified in this memo, which should 
be addressed before approval.

NOTE: I am recusing myself from IESG processing on this document. My 
opinion should be considered based on the reasonableness of my arguments 
only.

Introduction
------------
Let me make one thing perfectly clear: I do not like this proposal. To me, 
it smacks of overengineering and overambitious frameworks that allow an 
extremely large number of variations that recipients must be able to handle 
in order to deal with a problem with guidelines for registration and 
responsiveness of the registration authority.

I think subtag registration is an approach that is harder to manage and 
less useful to the wide communtiy than a whole-tag registration scheme, and 
I think that generative grammars that permit "all well-formed tags" is less 
useful than a scheme that permits only tags that someone has bothered to 
argue are useful.
But I have been convinced by the debate on the ietf-languages list that a) 
I'm in a minority on this, and b) there are reasonable arguments for 
switching to a generative scheme.

So I'm not going to argue that we should either ask this proposal to 
abandon its generative scheme or switch from subtag to whole-tag 
registration. We have had that debate, and I have not convinced others.

The document also dramatically changes the purpose of language tags; RFC 
3066 deliberately identified language ONLY; the current proposal says:

   These identifiers can also be used to indicate additional attributes
   of content that are closely related to the language. In particular,
   it is often necessary to indicate specific information about the
   dialect, writing system, or orthography used in a document or
   resource, as these attributes may be important for the user to obtain
   information in a form that they can understand, or important in
   selecting appropriate processing resources for the given content.

This is a dramatic shift in focus, and is the basis for many of the 
changes. I do not like this shift, but see the arguments for it. And the 
average opinion of the ietf-languages list seems to be in favour of such a 
shift.

But - these big issues aside - I have many other problems with this 
document. Below are some of them.

NOTE: I heartily APPROVE and APPLAUD the designation of a single pattern 
for tags that include script and country variations of a language, when 
such are found necessary. I also find the -x- mechanism for adding 
non-global information into a tag a reasonable mechanism.
These are not my worries.

Use of non-registered codepoints
--------------------------------
RFC 3066 says (section 2.2):

> All 2-letter subtags are interpreted according to assignments
> found in ISO standard 639, "Code for the representation of names
> of languages" [ISO 639], or assignments subsequently made by the
> ISO 639 part 1 maintenance agency or governing standardization
> bodies.

And similar text for the 3-letter subtags.

This means that "private use" tags have NO interpretation. That was 
intentional.

This proposal says (section 2.2, 3rd bullet of 3rd bulleted list):

> o  ISO639-2 reserves for private use codes the range 'qaa'
> through 'qtz'. These codes should be used for non-registered
> language subtags.

Similar for script codes.

Interchanging private-use subtags is not something that should be 
absolutely outlawed, in my opinion. But if it is to be mentioned at all, it 
needs to have a LARGE caveat that such use MUST be "only between consenting 
adults".

Private-use subtags are simply useless for information exchange without 
prior arrangement.

Excessive extensibility
-----------------------
Not only does this proposal make "legal" a huge number of hitherto 
undreamed-of tags, it provides several means of extensibility.

Namely:

- Extended language tags: 3-letter tags following the first subtag get a 
long section on how they might be used if ISO ever creates something that 
fits into this space.

- Extension single-letter tags: Section 3.3 specifies rules for subtags 
that are specific enough and onerous enough that it's likely that any 
proposal for use of subtags would be fair game for a procedural 
denial-of-service attack. Just the "specification..... must be available 
over the Internet and at no cost" possibly invalidates this very document. 
And their ordering is specified, including speculation about the ordering 
requirements they may impose.

- Allowing/encouraging private-use "q" tags, private-use "x" tags and the 
"x-" singleton. Allowing ALL of these seem excessive.

- Allowing registration of *any* language tag longer than 3 characters as 
the first subtag of a tag. This opens the door for IANA to become "fallback 
when you are rejected by ISO 639 RA" wider than the I- tag ever did.

In my opinion, the 3-letter speculation is simply not reasonable to 
include. It should be restricted to a simple statement that "codes that 
consist of a language subtag followed by a 3-letter subtag are not defined 
by this memo, and are reserved for future extension". Period.

Similarly, it should say "do not generate tags containing singletons - 
these are reserved for future use" - and define the particular case of "x-".

The document is 35 pages long. RFC 3066 was 13 pages. Cutting out this sort 
of overspecification would help reduce the growth.

Syntactical issues
------------------
The ABNF for tags is simply broken.
The fact that it actually passed at least one ABNF verifier came as a shock 
to me. There is no way this document should be approved with the ABNF the 
way it is.

The way that single-letter subtags is used for "escape into other tag 
coding systems" is made more baroque than necessary by the excessive 
rule-making about not being in the first subtag, alphabetical ordering of 
tag sequences introduced by a single-letter subtag and so on. For an escape 
mechanism, it is overspecified; for a coding system for non-language 
information inside language tags, it is underspecified.

Allowing single-character subtags in the first tag positon would allow 
grandfathering of the "i-" subtags without special tag magic.
And trying to encode the fact that "x" has a defined meaning in the ABNF 
looks gross. It is better to define the special meaning of "x" in text only.

The grammar given is incompatible with RFC 3066; RFC 3066 allows subtags to 
be up to 8 characters; the proposal lengthens this to 15 characters without 
any justification for the change. Subtags in extensions can hit 31 
characters; the reason to make 2 different length is not obvious.

Deficent IANA instructions
--------------------------
The IANA considerations are deficent.

The language registration form has only been converted halfway from the 
"language" to the "subtag" notion - it still talks about " Native name of 
language (transcribed into ASCII)" and "Reference to published description 
of the language (book or article)".

This is clearly not clear when one subtag can be used with multiple 
prefixes - which was the point of registering subtags in the first place, 
wasn't it?

The conversion rules specified depend heavily on the langtag (now subtag) 
reviewer. This may be a feature, but requires the language tag reviewer to 
commit to doing some work.

The registration of the necessary UN codes in IANA should not be optional. 
The interesting codes should be given in this memo.

Grammar issues
--------------

Section 2.3 bullet 6: The NOTE is not specific to bullet 6. If specific to 
any bullet, it should be specific to bullet 4.