New Last Call: 'Tags for Identifying Languages' to BCP

Peter Constable petercon at
Tue Dec 14 00:34:12 CET 2004

Bruce Lilly has posted comments on the IETF list in response to the
last-call announcement for a proposed revision to RFC 3066. His comments
were generally negative, raising a number of concerns. I and others
involved in preparation of the revision have discussed Bruce's concerns
with him, but they were not made available on the IETF list since those
of us other than Bruce were not subscribed to this list. I wish to
briefly summarize the outcome of that discussion for the benefit of
people here.


Some of Bruce's comments were purely editorial (e.g. formatting of
draft); I will not review those.


Bruce's substantive concerns were:


-         Accessibility of source ISO standards was referred to in the
announcment as a major reason for the proposed revision, but
accessibility has not been a problem in his experience.


-         RFC directed users to source ISO standards; the proposed
revision would establish a registry that includes all ISO identifiers
considered valid for use in language tags, but the documentation for
those identifiers in this registry does not include both English and
French language / country names. 


-         The proposed revision makes referene to ISO 8601 time/date
format being used in the registry, which is a complex and
not-readily-available specification.


-         The ABNF used in the proposed draft permits many strings that
do not conform with RFC 3066.


-         The proposed revision imposes no bounds on the length of tags
(same as RFC 3066), and does not require registration of complete tags
(different from RFC 3066).


-         The lack of an "Internationalization considerations" section
as recommended by RFC 2277 (a.k.a. BCP 18).


As a result of Bruce's comments, those of us contributing to the
development of this revision have suggested certain revisions to which
the authors have indicated openness. As I will explain, these revisions
would provide clarification on various matters, but would not constitute
technical changes in the draft.


1. Re accessibility: it was pointed out that the draft itself does not
identify accessibility of source ISO standards as one of the primary
reasons for the revision. There are some minor accessibility concerns
having to do with uncertainty of the on-going availability to the
relevant ISO code tables, and to change histories for each of the
relevant ISO standards. The proposed changes to the language-tag
registry address these concerns, though there were bigger reasons for
the proposed registry changes, particularly having to do with stability.



2. Re the lack of French descriptions in the registry: it was pointed
out that the registry defined by RFC 3066 did not include French
descriptions, and that the revised registry is not intended to replace
the source ISO standards or make them irrelevant. The meaning of IDs
would still be established from the ISO standards from which they were
drawn, and the proposed revision would continue to make reference to
them. As a result of Bruce's comments, it was suggested that wording be
revised in the draft to make this relationship clearer.



3. Re ISO 8601 time/date format: What is used in the registry is dates
expressed in the format "YYYY-MM-DD". It was agreed that it would be
better to identify the format precisely rather than make the generic
reference to ISO 8601.



4. Re the less restrictive ABNF: the one place that had less restrictive
syntax was a production rule that was subject to additional strict
constraints, namely that only certain pre-existing tags registered under
RFC 3066 could fall under that production. A change to the ABNF has been
suggested that would make the ABNF at that point consistent with the
ABNF for RFC 3066. This does not constitute a change having any
technical consequence as there is no resulting change in the set of
valid tags.



5. Re upper bounds on length of tags: It was pointed out that
private-use tags for both RFC 3066 and the proposed revision have no
bounds on their length. The greater concern was for non-private-use
tags. For these, it was pointed out that RFC 3066 also imposes no bounds
on length. Admittedly, though, there is a difference because RFC 3066
requires registration of complete tags, so one can determine at any time
what is the longest valid tag that may be encountered, whereas the
proposed revision requires registration of sub-tags which can then be
combined productively, and one cannot predict with certainty what
combinations may be used. (This, IMO, is the most significant of the
concerns Bruce raised.)


While the proposed revision allows productive combinations of registered
sub-tags, there are some limits on how combinations can be made, as
specified by the ABNF. The ABNF does allow unlimited numbers of certain
elements - specifically three. 


One of these ('extlang') is defined by the ABNF in anticipation of
possible future extension of the language tag specification to
incorporate mechanisms expected in a new part to ISO 639 that is in
preparation, but is not made avaialble for use at this time. 


Another ('variant') requires sub-tags to be registered, and requires
that the registration indicate prefix sub-tags that they are recommended
to be used with. While it may still be technical valid to use a
registered variant in some way other than the recommendatation, that
will be unlikely (just as certain combinations valid under RFC 3066,
such as ja-DE are unlikely). Thus, implementers will have a reasonable
chance of anticipating what combinations will be used. 


The third of these ('extension') is defined as mechanism for extending
language tags for use in future protocols. There is an upper limit of 25
extensions, though this RFC does not define limits on the length of each
extension. There are no extensions defined at this time, and any
extension would require specification in the form of a separate RFC. At
such time as one or more extension RFCs are defined, those
specifications would provide some indication of what limits they do or
don't impose on the length of extensions. In the case of any protocol
that supports this proposed revision to RFC 3066 but does not support
extensions, any extensions that may be included in a language tag are


Apart from extensions, all of the mechanisms introduced in the proposed
revision were in response to the direction users and implementers were
already going with registered tags under RFC 3066. Thus, while the
proposed revision gives greater provision for lengthy tags, this is not
completely unrestrained, and the practical likelihood of encountering
tags of any given length would be no greater under the proposed revision
than it was under RFC 3066.


Even so, verious changes were suggested to highlight issues related to
length, specifically with a view to the possibility that some
applications of RFC 3066 (or this proposed revision) would impose fixed
limits on the length of tags. These suggestions included notes in that
regard in key points within the RFC, but also in sub-tag registrations
and in RFC defining extensions. (For instance, a variant registration
would include not only a recommendation on appropriate prefixes, but
also specific comments on maximal length of tags using the given
variant.) There were no suggestions to impose limits on the length of
tags in the RFC itself (just as RFC 3066 does not impose limits).
Basically, limits on length was seen to be a concern belonging to
particular applications of the language-tag spec and not the spec
itself, but significant additions would be added to the RFC so that
these concerns are highlighted.



6. Re an i18n-considerations section: It was pointed out that language
tags are symbolic identifiers with no culture-specific content; the only
i18n consideration related to the identifiers themselves are charset,
and charset issues are covered in the section on syntax. Bruce was also
concerned about i18n considerations in the registry (see issue #2, above
- lack of French-language descriptions), but it was pointed out that the
content of the registry is not intended as localization data, that there
are well-established precedents for code sets that are not documented in
terms of multilingual content, and therefore that it was not really
necessary to discuss i18n concerns in relation to the registry (no more
than it is necessary to have a section to discuss i18n issues in
relation to the IANA charset registry in RFC 2978).



In conclusion, I think that some of Bruce's concerns were valid, and
suggestions for changes have been presented to the authors accordingly.
I believe all of these changes can be considered to be for clarification
purposes, rather than technical changes. (No changes affecting the set
of valid tags have been made.)






Peter Constable



-------------- next part --------------
An HTML attachment was scrubbed...

More information about the Ietf-languages mailing list