New Last Call: 'Tags for Identifying Languages' to BCP

Peter Constable petercon at microsoft.com
Sat Dec 11 02:03:05 CET 2004


Resuming my comments:


> -----Original Message-----
> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> bounces at alvestrand.no] On Behalf Of Bruce Lilly

[snip]

> Specifically, the draft allows, and RFC 3066 disallows:
>    subtags more than 8 octets in length
>    hyphens which do not separate subtags
>    zero-length subtags
>    primary tags which are not purely alphabetic
> Curiously, all of those are permitted by the draft ABNF
> production "grandfathered"...

The "grandfathered" production in the current draft is 

grandfathered   = ALPHA *(alphanum / "-")

which does permit the sequences claimed by Bruce (except for
not-purely-alphabetic primary sub-tags), syntactically; but the set of
tags available for use is constrained by more than the ABNF syntax
alone: the acceptable productions for each sub-tag must either be taken
from one of the source standards or be registered. This is no different
from RFC 3066, so it is no more of a problem in this specification than
it was in RFC 3066.

It might be that the wording in 2.2 could be tightened up to eliminate
any possible question regarding the source for "grandfathered"
productions. Maybe it's not as obvious to someone coming to this cold as
it for us who have been discussing it for the past year.

Alternately, there's no reason why the "grandfathered" production
shouldn't be composed exactly to match what was used in RFC 3066:

grandfathered = 1*8ALPHA *("-" 1*8alphanum)

So, perhaps there is room for technical improvement, but there are not
any serious problems IMO -- certainly nothing as serious as the tone of
Bruce's conveyed.


> I see no reason for the ABNF to permit such content as is
> forbidden by RFC 3066; the actual ABNF for what RFC 3066
> permits is contained within 3066, and could have been directly
> incorporated rather than producing a "grandfathered"
> production which opens up several cans of worms.

This vastly overstates the problem. There is no can of worms unless it
exists in tags currently available under RFC 3066.

 
> One defect related to tag length in RFC 3066 is not remedied
> by the draft; indeed the problem is greatly exacerbated...

> Unfortunately, a language- tag's length is unlimited by
> the ABNF in RFC 3066 (due to an unlimited number of subtags)
> and in the draft...

> In particular, tags other than private-use tags with more than
> two subtags require registration under RFC 3066 rules, and it
> is a trivial matter to determine the longest registered tag.
> The draft, however, encourages use of more subtags as well as
> removal of the subtag length upper bound; moreover, it permits
> infinite numbers of subtags without requiring registration of
> the resulting complete tag.

Bruce states incorrectly that there is no upper bound on the length of
sub-tags. His other concern, on the overall length of complete tags, is
valid, however: in terms of the ABNF syntax for both RFC 3066 and RFC
3066bis, infinite-length productions are possible, but RFC 3066 would
require registration of complete non-private-use tags while RFC 3066bis
does not.

There are three open doors for infinite-length productions in the ABNF
of the current draft:

- unlimited extlang sub-tags
- unlimited variant sub-tags
- the number of possible extensions is limited to 25, but the length of
extensions is unlimited

We could impose some upper limits on these things; e.g.

Language-Tag = ... *8("-" extlang) ... *8("-" variant) ... 1*25("-"
extension)
...
extension = singleton 1*8("-" 2*8alphanum)

If we also imposed limits on the length of private-use tags and defined
the grandfathered production in a way that made clear there was an upper
limit for those, then we could end up eliminating an issue that had
existed in RFC 3066.

So, I think Bruce has identified a valid issue here. I personally would
not have characterized it as greatly exacerbating, though, as the issue
was present in RFC 3066: private-use tags did not need to be registered
in RFC 3066, so there was no way in implementation could be written with
certain knowledge that tags beyond some given length would not be
encountered.


> > The new registry provides a complete,
> > easily parseable file which provides the precise the contents of
valid tags for
> > any point in time.
> 
> That is the first time I have ever heard ISO 8601 date
> format described as "easily parseable".  Perhaps the draft
> authors meant to say that a specific subset of the tortuously
> complex ISO 8601 date format is used, but that is not what
> the draft states...

It seems very clear that the authors intended only a specific subset:
YYYY-MM-DD. This is a minor technical issue that the authors can very
easily remedy.


> I am absolutely shocked that a draft dealing with language
> lacks an "Internationalization considerations" section as
> recommended by RFC 2277 (a.k.a. BCP 18).

No more or less shocking than for RFC 3066, regarding which I'm not
aware of any complaints.

I don't quite understand what the critique is here: what is there to
internationalize about language tags? They are symbolic identifiers that
have no culture-specific content. The only possible consideration is the
charset, which for this spec involves ALPHA, DIGIT and "-" only. It's
true that ALPHA and DIGIT are not defined and that it would be better to
do so; it couldn't hurt to have a section for i18n considerations
(wouldn't need to be long). These are very minor concerns, and hardly
"shocking".


 
> Perhaps even more disturbing is the content of the "IANA
> Considerations" section; the draft predicts that certain things
> will happen ("IANA will"[...]), but doesn't actually direct
> (e.g. "IANA shall") IANA to do anything.  The placement of that
> section does not correspond to current RFC-Editor guidelines
> (it should appear after Security Considerations); also on that
> point, Appendices should precede References.

There is a process issue here, but I have assumed that the authors have
dealt with IANA on that. Otherwise, these are editorial issues -- "even
more disturbing" seems to me to be somewhat overstated.


> Many of the references are obsolete (e.g. RFCs 1327,
> 1521)... and at least one reference ([19])
> gives a bracketed URI rather than the correctly formatted
> RFC reference.  Although reference is made to the "Accept-
> Language" header field, RFC 3282 (the defining RFC for that
> field) is not listed among the references... 

> The formatting of the draft is atrocious

All editorial.


> there is no differentiation between normative and
> informative references, 

A valid concern.

 
> I am extremely surprised that the draft has been published
> at least nine times in such a state of poor formatting and
> poor attention to editorial content (e.g. obsolete and
> missing references), and that it progressed as far as IESG
> last call in such a state, with no Internationalization
> considerations section, etc.

In fairness to the authors, page-oriented plain text is not exactly
conducive to authoring and revising a long document, and a lot of energy
was spent focusing on details that have far more consequence than
formatting. And, as mentioned above, the lack of an i18n-concerns
section is hardly without precident, and not particularly significant in
the case of this spec. This really feels like nit-picking, IMO. I'm left
wondering if Bruce has been looking for nits to pick because he is...


> ... particularly concerned about the implementation
> ramifications of the proposed changes, especially (as
> noted in detail above):
> 1. the apparent contradiction between the stated
>     objectives w.r.t. accessibility of relevant ISO data and
>     standards and the reality of the proposal's
>     implications (ISO 8601 date format parsing).

As mentioned above, this really is a non-issue.


> 2. the clear contradiction between the claims about
>     ABNF compatibility with RFC 3066 and the factual
>     incompatibility of certain provisions in the grammar.

The main concern was with the "grandfathered" production, but I've shown
that that is a non-issue. The maximal length issue exists just as much
in RFC 3066 due to private-use tags; it is a technical concern that
might worth reviewing in RFC 3066bis, however; but it is not
insurmountable, and not a new problem.



Peter Constable
Microsoft Corporation


More information about the Ietf-languages mailing list