Review of draft-phillips-langtags-03
Harald Tveit Alvestrand
harald at alvestrand.no
Thu Jun 24 10:00:52 CEST 2004
This document has significant issues, identified in this memo, which should
be addressed before approval.
NOTE: I am recusing myself from IESG processing on this document. My
opinion should be considered based on the reasonableness of my arguments
Let me make one thing perfectly clear: I do not like this proposal. To me,
it smacks of overengineering and overambitious frameworks that allow an
extremely large number of variations that recipients must be able to handle
in order to deal with a problem with guidelines for registration and
responsiveness of the registration authority.
I think subtag registration is an approach that is harder to manage and
less useful to the wide communtiy than a whole-tag registration scheme, and
I think that generative grammars that permit "all well-formed tags" is less
useful than a scheme that permits only tags that someone has bothered to
argue are useful.
But I have been convinced by the debate on the ietf-languages list that a)
I'm in a minority on this, and b) there are reasonable arguments for
switching to a generative scheme.
So I'm not going to argue that we should either ask this proposal to
abandon its generative scheme or switch from subtag to whole-tag
registration. We have had that debate, and I have not convinced others.
The document also dramatically changes the purpose of language tags; RFC
3066 deliberately identified language ONLY; the current proposal says:
These identifiers can also be used to indicate additional attributes
of content that are closely related to the language. In particular,
it is often necessary to indicate specific information about the
dialect, writing system, or orthography used in a document or
resource, as these attributes may be important for the user to obtain
information in a form that they can understand, or important in
selecting appropriate processing resources for the given content.
This is a dramatic shift in focus, and is the basis for many of the
changes. I do not like this shift, but see the arguments for it. And the
average opinion of the ietf-languages list seems to be in favour of such a
But - these big issues aside - I have many other problems with this
document. Below are some of them.
NOTE: I heartily APPROVE and APPLAUD the designation of a single pattern
for tags that include script and country variations of a language, when
such are found necessary. I also find the -x- mechanism for adding
non-global information into a tag a reasonable mechanism.
These are not my worries.
Use of non-registered codepoints
RFC 3066 says (section 2.2):
> All 2-letter subtags are interpreted according to assignments
> found in ISO standard 639, "Code for the representation of names
> of languages" [ISO 639], or assignments subsequently made by the
> ISO 639 part 1 maintenance agency or governing standardization
And similar text for the 3-letter subtags.
This means that "private use" tags have NO interpretation. That was
This proposal says (section 2.2, 3rd bullet of 3rd bulleted list):
> o ISO639-2 reserves for private use codes the range 'qaa'
> through 'qtz'. These codes should be used for non-registered
> language subtags.
Similar for script codes.
Interchanging private-use subtags is not something that should be
absolutely outlawed, in my opinion. But if it is to be mentioned at all, it
needs to have a LARGE caveat that such use MUST be "only between consenting
Private-use subtags are simply useless for information exchange without
Not only does this proposal make "legal" a huge number of hitherto
undreamed-of tags, it provides several means of extensibility.
- Extended language tags: 3-letter tags following the first subtag get a
long section on how they might be used if ISO ever creates something that
fits into this space.
- Extension single-letter tags: Section 3.3 specifies rules for subtags
that are specific enough and onerous enough that it's likely that any
proposal for use of subtags would be fair game for a procedural
denial-of-service attack. Just the "specification..... must be available
over the Internet and at no cost" possibly invalidates this very document.
And their ordering is specified, including speculation about the ordering
requirements they may impose.
- Allowing/encouraging private-use "q" tags, private-use "x" tags and the
"x-" singleton. Allowing ALL of these seem excessive.
- Allowing registration of *any* language tag longer than 3 characters as
the first subtag of a tag. This opens the door for IANA to become "fallback
when you are rejected by ISO 639 RA" wider than the I- tag ever did.
In my opinion, the 3-letter speculation is simply not reasonable to
include. It should be restricted to a simple statement that "codes that
consist of a language subtag followed by a 3-letter subtag are not defined
by this memo, and are reserved for future extension". Period.
Similarly, it should say "do not generate tags containing singletons -
these are reserved for future use" - and define the particular case of "x-".
The document is 35 pages long. RFC 3066 was 13 pages. Cutting out this sort
of overspecification would help reduce the growth.
The ABNF for tags is simply broken.
The fact that it actually passed at least one ABNF verifier came as a shock
to me. There is no way this document should be approved with the ABNF the
way it is.
The way that single-letter subtags is used for "escape into other tag
coding systems" is made more baroque than necessary by the excessive
rule-making about not being in the first subtag, alphabetical ordering of
tag sequences introduced by a single-letter subtag and so on. For an escape
mechanism, it is overspecified; for a coding system for non-language
information inside language tags, it is underspecified.
Allowing single-character subtags in the first tag positon would allow
grandfathering of the "i-" subtags without special tag magic.
And trying to encode the fact that "x" has a defined meaning in the ABNF
looks gross. It is better to define the special meaning of "x" in text only.
The grammar given is incompatible with RFC 3066; RFC 3066 allows subtags to
be up to 8 characters; the proposal lengthens this to 15 characters without
any justification for the change. Subtags in extensions can hit 31
characters; the reason to make 2 different length is not obvious.
Deficent IANA instructions
The IANA considerations are deficent.
The language registration form has only been converted halfway from the
"language" to the "subtag" notion - it still talks about " Native name of
language (transcribed into ASCII)" and "Reference to published description
of the language (book or article)".
This is clearly not clear when one subtag can be used with multiple
prefixes - which was the point of registering subtags in the first place,
The conversion rules specified depend heavily on the langtag (now subtag)
reviewer. This may be a feature, but requires the language tag reviewer to
commit to doing some work.
The registration of the necessary UN codes in IANA should not be optional.
The interesting codes should be given in this memo.
Section 2.3 bullet 6: The NOTE is not specific to bullet 6. If specific to
any bullet, it should be specific to bullet 4.
More information about the Ietf-languages