draft-phillips-langtags-08, process, specifications, "stability", and extensions (Was Language Identifier List Comments, updated)

Wed Dec 29 16:31:25 CET 2004

> RE: Language Identifier List Comments, updated
>  Date: 2004-12-28 18:22
>  From: "Addison Phillips [wM]" <aphillips at webmethods.com>
> To: "JFC (Jefsey) Morfin" <jefsey at jefsey.com>, "John Cowan" <jcowan at reutershealth.com>
>  CC: ietf-languages at alvestrand.no

> The draft isn't a process draft. Take your process problems to the IETF or IESG (or W3C or appropriate standards body).

The draft defines a registration procedure; if it did not do so,
it would probably not be a candidate for BCP (vs. some other type
of RFC).  Aside from the process/procedure that the draft seeks to
establish, there are process/procedure issues having to do with
the origin of the draft, statements about "extensions", and IETF
procedures and mission as specified in RFCs 2026, 2418, and 3935.
And, in accordance with the New Last Call and the procedures
detailed in RFC 2026, the issues are being taken to the IETF/IESG,
however much some participants in the discussion may dislike those
procedures.

> This draft defines language tags.

Yes. And a registry format technical specification.  And a matching
algorithm technical specification. In addition to the registration
process.

> Other drafts, RFCs, specs, etc. define processes and applications that use them. The appropriate use of language tags is the concern of those specifications.

Per RFC 2026, an application having specific requirements for use
of Technical Specifications (TS) should provide an Applicability
Statement (AS) specifying specific requirement levels for each
TS involved...

> If there is some text that this draft should carry to help guide implementations, please suggest it so that we can all consider it.   

It would help immensely if the 3 technical specifications (tag
format, registry format, matching algorithm) were separated as
separate documents to facilitate reference as independent TSs,
and to facilitate any individual extensions/revisions, etc.
that may be necessary in the future, and to keep those separate
from the registration procedure which itself may need to be
separately referenced and/or revised.

> No, the revision clearly expands the scope of language distinctions that can be represented with a language tag--quite significantly in some cases.

Indeed, and without registration of the tags and the review process
associated with that (existing RFC 3066) registration procedure. As
Harald Alvestrand pointed out some time ago, that (inappropriately)
shifts implementation effort from the tag generator (no registration
required) to the recipient (what the heck does this mysterious tag
actually *mean*).

> But its grammar is much more restrictive, in part to ensure full backwards compatibility with tiny little applications like, oh, say XML.

It may have been intended to have been more restrictive, but it
needs work to achieve that goal (as previously discussed in
detail).

XM who?  What about core Internet protocols such as MIME and the
Internet Message format (STD 11)?  I believe XML is a w3 consortium
product, not an IETF product.

> It also restricts future development of compatible language tags in an effort to ensure that implementations of draft-langtags are stable over time and extended in a controlled manner.  

I still believe there is a problem with the proposed method of
handling "CS", which is destabilizing (given previously documented
use of "sr-CS" vs. the demise of Czechoslovakia prior to use of
country codes in language tags (RFC 1766)).  I have yet to see a
detailed concrete proposal for a general procedure that would
ensure stability of the current meaning of "CS" embodied in a
general principle as part of the registration procedure. [N.B.
making a special-case exception for "CS" doesn't address the issue.]

> We greatly expanded what can be represented in four major ways:
> 
> 1. Added script subtags for writing system variations.
> 2. Mixed generative and private use subtags for private minor distinctions in tags.
> 3. Extensions for really specialized distinctions.
> 4. UN M49 region codes, including supra-national regions to represent geographical distinctions not covered by ISO 3166 or by instability in same.

It's not entirely clear if some of those items (e.g. script) should
be expressed by an orthogonal mechanism rather than embedded in a
*language* tag (for that matter, in retrospect, country codes was
probably a bad idea).

The whole "stability" brouhaha seems to be a tempest in a teapot.
Surely the issue could be addressed in a professional manner by
reaching an agreement with ISO/UN regarding the issue, as has been
done for the case of 2-letter vs. 3-letter codes and stability of
existing 3-letter codes.

> This is dealt with in Section 2.4.2 "Matching". This section clearly details the fallback mechanism (which is compatible with the one in RFC 3066), as well as some considerations for additional matching that can be done by specialized processors that implement a different mechanism. The matching algorithm is the standard one, but is not mandatory. In fact, I have a paper with Jeremy Carroll on a different matching algorithm that an OWL implementation might use. Read this section of the draft carefully.

I note that Frank Ellerman has raised some issues, but as yet I
haven't seen any response.  The existence of multiple mechanisms,
coupled with issues regarding the one proposed in the draft, is
a strong indication that the matching algorithm should be split
into a separate document (possibly as one of multiple Experimental
RFCs, or as a Standards Track or Informational RFC).

> If one specifies "en-FR", then one should not expect to receive anything less specific than "en-FR".

Are you referring to use in Accept-Language fields or in Content-
Language fields (or equivalent accept/send dichotomy)?

> In software resources generally one specifies the *most specific* (granular) tag that one will accept and may receive less specific content (which may include the default content).

Indeed; hence the question above. [I also note in passing that
IETF deals with the Internet in particular, not with "software
resources generally".]

> In language tag matching one specifies the *least specific* tag that one will accept and won't receive anything less specific (although you might receive something more specific). 

I'm not sure; if one indicates acceptance of Franglais (en-FR),
receiving plain en is probably acceptable.  Receipt of en-FR-<Brittany>
for whatever mechanism is used to indicate the variant of English
spoken in the region of Brittany (where Breton is a Gaelic language,
rather than one derived from Latin, like French, or of Germanic root,
like English) in the country of France, might well be incomprehensible
to an English-speaking Frenchman from Alsace. [Let's not confuse the
specific example with the general principle which it illustrates.]

> The language tag syntax from RFC 3066 itself cannot be changed. draft-langtags carefully adds restrictions to the ABNF and grammar of the tags to ensure that this is so.

Again, the implementation falls short of the promise.

> Changing the sources for existing subtags or the interpretation of any particular existing language tag is not permitted if we are to maintain backwards compatibility.

Agreed that there would be a backwards compatibility problem with
changing the source.  Which is why there is an issue with "CS" being
defined in the ISO lists by reference as is currently the case with
RFC 3066, vs. the proposal to change the source to a separate IANA
registry which handles "CS" specially (i.e. differently from many
other ISO-derived codes).

> To be perfectly blunt: we've worked over a year on this project. If you have specific comments on this draft, with suggestions for improvements, please send those to the list so that they can be viewed by the community and so that Mark and I can address them. Your suggestions for additional changes to the syntax of language tags we find to be incompatible (to the extent that we understand them) with RFC 3066 and our own work on draft-langtags. You will note that draft-langtags can accommodate your requirements using the mechanisms spelled out above and in the draft... so I fail to see what we should change. If you can express that, we'll consider it. Otherwise you are free to do as we did and write your own draft. Internet-Drafts are a volunteer effort and do not write themselves. Neither is there a Star Chamber of people who create them in the dead of night. If you see a need, fill it. I would suggest: wait for draft-langtags to be an RFC and write an extension that does what you want.

See RFC 2418; specifically section 2.3 and the comment about consensus
about a wrong design.  See also the RFC 2026 process requirements and
RFC 2418 procedures; a group which has no charter or equivalent
document, no written record of meetings, etc. might very well be
described as "a Star Chamber of people".

One doesn't write "extensions" to BCP RFCs (that's one of the problems
with the agglomeration of specifications in the current document); a
BCP is replaced wholesale (although in theory it might be possible to
have two related BCPs coexist per the details in RFC 2026 section 6.3;
but that is unlikely, and in any event the current draft does not
contain the sort of statement required to coexist with RFC 3066).