draft-phillips-langtags-08, process, specifications, "stability", and extensions (Was Language Identifier List Comments, updated)

Addison Phillips [wM] aphillips at webmethods.com
Wed Dec 29 23:45:32 CET 2004


Comments below. I must admit that I'm losing the ability to respond to this thread, since it contains direct statements that no response will satisfy the correspondent. 

Addison

Addison P. Phillips
Director, Globalization Architecture
http://www.webMethods.com

Chair, W3C Internationalization Working Group
http://www.w3.org/International

Internationalization is an architecture. 
It is not a feature.

> -----Original Message-----
> From: ietf-languages-bounces at alvestrand.no
> [mailto:ietf-languages-bounces at alvestrand.no]On Behalf Of Bruce Lilly
> Sent: 2004年12月29日 7:31
> To: ietf-languages at alvestrand.no
> Cc: ietf at ietf.org
> Subject: Re: draft-phillips-langtags-08, process, specifications,
> "stability",and extensions (Was Language Identifier List Comments,
> updated)
> 
> 
> > RE: Language Identifier List Comments, updated
> >  Date: 2004-12-28 18:22
> >  From: "Addison Phillips [wM]" <aphillips at webmethods.com>
> > To: "JFC (Jefsey) Morfin" <jefsey at jefsey.com>, "John Cowan" 
> <jcowan at reutershealth.com>
> >  CC: ietf-languages at alvestrand.no
> 
> > The draft isn't a process draft. Take your process problems to 
> the IETF or IESG (or W3C or appropriate standards body).
> 
> The draft defines a registration procedure; if it did not do so,
> it would probably not be a candidate for BCP (vs. some other type
> of RFC).  Aside from the process/procedure that the draft seeks to
> establish, there are process/procedure issues having to do with
> the origin of the draft, statements about "extensions", and IETF
> procedures and mission as specified in RFCs 2026, 2418, and 3935.
> And, in accordance with the New Last Call and the procedures
> detailed in RFC 2026, the issues are being taken to the IETF/IESG,
> however much some participants in the discussion may dislike those
> procedures.

The origin of the draft is an individual submission, governed by the various RFCs cited. What's the problem with that? Are individual submissions somehow inappropriate now?

This draft does not modify the process of the IETF or govern what other I-Ds may choose to do except in reference to language tags. As I pointed out in the context removed from the above statement: this I-D would be inappropriate if it attempted to govern what language tags were used for.

Do what you feel is warranted, Bruce. You don't appear to be trying to achieve consensus, which is the touchstone of the IETF process as I understand it. If you feel issues should be taken to the IESG, then do so. 
> 
> > This draft defines language tags.
> 
> Yes. And a registry format technical specification.  And a matching
> algorithm technical specification. In addition to the registration
> process.

.... just like RFC 3066 did.
> 
> > Other drafts, RFCs, specs, etc. define processes and 
> applications that use them. The appropriate use of language tags 
> is the concern of those specifications.
> 
> Per RFC 2026, an application having specific requirements for use
> of Technical Specifications (TS) should provide an Applicability
> Statement (AS) specifying specific requirement levels for each
> TS involved...

The draft provides specific requirements for language tags themselves, which are strings compatible with the RFC 3066 strings already used by the other specifications. The applicability and requirements for this iteration of language tags is the same as it was under RFC 3066. The language tags created do not break existing specifications. The requirements in this document were calibrated to allow all existing RFC 3066 references to remain in force without prejudice. In fact, we did NOT change things that might have otherwise been changed in order to ensure deep compatibility.

Ultimately, the existance of the RFC 3066 language tag registry trumps all of your arguments about this: all of the tags defined in the generative mechanism of RFC 3066bis could have been registered under 3066 (with loss of functionality for the users of those tags, to be sure). The argument that every complete tag used anywhere is trumped by the existance of the generative mechanism in RFC 3066. Registered variant subtags still must have a recommended range to which they apply. Very little has changed, except that using subtags is a bit more logical.

> 
> > If there is some text that this draft should carry to help 
> guide implementations, please suggest it so that we can all 
> consider it.   
> 
> It would help immensely if the 3 technical specifications (tag
> format, registry format, matching algorithm) were separated as
> separate documents to facilitate reference as independent TSs,
> and to facilitate any individual extensions/revisions, etc.
> that may be necessary in the future, and to keep those separate
> from the registration procedure which itself may need to be
> separately referenced and/or revised.

Well there at last is a suggestion. We think splitting the draft up would not be a benefit because the three items are closely linked and have historically been in one document. There is no indication that any of these items will be separately revised in the future. While I'm sure it is possible, I think it would be wiser to keep these items together, since they have historically been together.
> 
> > No, the revision clearly expands the scope of language 
> distinctions that can be represented with a language tag--quite 
> significantly in some cases.
> 
> Indeed, and without registration of the tags and the review process
> associated with that (existing RFC 3066) registration procedure. As
> Harald Alvestrand pointed out some time ago, that (inappropriately)
> shifts implementation effort from the tag generator (no registration
> required) to the recipient (what the heck does this mysterious tag
> actually *mean*).

Nonesense. There is the same review process (strengthened somewhat, actually, from experience) for subtags. Harald's point, I think, is not valid because only the registered (and rarely implemented) tags were subject to scrutiny. The new draft actually provides a framework in which any subtag's type can be discerned from its position and size, even if the subtag itself is unrecognized: this is actually *better* than you could obtain with the existing registry. 

The generator *is* required to register non-private use subtags for use, so that statement mystifies me. You can't just use any subtag you feel like (except as private use). The recipient can access the registry to determine the meaning of any subtag (you couldn't do that before).
> 
> > But its grammar is much more restrictive, in part to ensure 
> full backwards compatibility with tiny little applications like, 
> oh, say XML.
> 
> It may have been intended to have been more restrictive, but it
> needs work to achieve that goal (as previously discussed in
> detail).

Dealt with in the pending draft-09 as previously discussed.
> 
> XM who?  What about core Internet protocols such as MIME and the
> Internet Message format (STD 11)? 

I could have cited those. The example was not intended as an exhaustive list, eh? Are you suggesting that XML isn't an important technology?

>  I believe XML is a w3 consortium
> product, not an IETF product.

So what? We don't like the W3C or something? 
> 
> > It also restricts future development of compatible language 
> tags in an effort to ensure that implementations of 
> draft-langtags are stable over time and extended in a controlled manner.  
> 
> I still believe there is a problem with the proposed method of
> handling "CS", which is destabilizing (given previously documented
> use of "sr-CS" vs. the demise of Czechoslovakia prior to use of
> country codes in language tags (RFC 1766)).  I have yet to see a
> detailed concrete proposal for a general procedure that would
> ensure stability of the current meaning of "CS" embodied in a
> general principle as part of the registration procedure. [N.B.
> making a special-case exception for "CS" doesn't address the issue.]

Well you can't have it both ways. Either CS means Czechoslovakia or it means Serbia and Montenegro.

You can see an early version of draft-09 that attempts to address it here:

http://www.inter-locale.com/ID/draft-phillips-langtags-09.html

Your comments on that would be appreciated.
> 
> > We greatly expanded what can be represented in four major ways:
> > 
> > 1. Added script subtags for writing system variations.
> > 2. Mixed generative and private use subtags for private minor 
> distinctions in tags.
> > 3. Extensions for really specialized distinctions.
> > 4. UN M49 region codes, including supra-national regions to 
> represent geographical distinctions not covered by ISO 3166 or by 
> instability in same.
> 
> It's not entirely clear if some of those items (e.g. script) should
> be expressed by an orthogonal mechanism rather than embedded in a
> *language* tag (for that matter, in retrospect, country codes was
> probably a bad idea).

There would be no RFC 1766 or 3066 if ISO 639 language codes actually captured all of the nuances of language (doh!). There is a clear need for script codes for distinguishing certain kinds of Chinese written material, as well as certain languages in which there are active script transitions or in which the language is commonly written in more than one script. Individuals not connected with this effort have attempted to register similar language tags recently. It is important to identify the writing system in those cases to many users.
> 
> The whole "stability" brouhaha seems to be a tempest in a teapot.
> Surely the issue could be addressed in a professional manner by
> reaching an agreement with ISO/UN regarding the issue, as has been
> done for the case of 2-letter vs. 3-letter codes and stability of
> existing 3-letter codes.

It is only *one* of the things addressed by the draft. But it is and remains important. Doug Ewell suggested to me that even if no RA or MA ever reuses a code again, it is still ISO 3166/MA's job is to keep the codes in sync with the current state of the world.  Whenever countries split up, join together, or change names, ISO 3166/MA will be there to change the code list.  The instability is not all the MA's fault, but we still need to protect against it because of legacy data. The lonely CS example should not become the state of affairs going forwards.
> 
> > This is dealt with in Section 2.4.2 "Matching". This section 
> clearly details the fallback mechanism (which is compatible with 
> the one in RFC 3066), as well as some considerations for 
> additional matching that can be done by specialized processors 
> that implement a different mechanism. The matching algorithm is 
> the standard one, but is not mandatory. In fact, I have a paper 
> with Jeremy Carroll on a different matching algorithm that an OWL 
> implementation might use. Read this section of the draft carefully.
> 
> I note that Frank Ellerman has raised some issues, but as yet I
> haven't seen any response.  The existence of multiple mechanisms,
> coupled with issues regarding the one proposed in the draft, is
> a strong indication that the matching algorithm should be split
> into a separate document (possibly as one of multiple Experimental
> RFCs, or as a Standards Track or Informational RFC).

Matching hasn't actually changed. Frank has raised some good issues: I believe I responded to his message.

The existance of multiple mechanisms isn't really an issue. The draft specifies ONE mechanism, just like RFC 3066, and notes that more specialized processing is possible. This isn't actually different than what RFC 3066 did in actual effect. We purposely did not specify experimental matching algorithms.
> 
> > If one specifies "en-FR", then one should not expect to receive 
> anything less specific than "en-FR".
> 
> Are you referring to use in Accept-Language fields or in Content-
> Language fields (or equivalent accept/send dichotomy)?

Yes and no. Accept/Content is one example of matching. Another might be a query on a document (as with XQuery on an XML document, for example). The remove-from-right matching rules in RFC 3066 (and the draft) have long had this particular design.
> 
> > In software resources generally one specifies the *most 
> specific* (granular) tag that one will accept and may receive 
> less specific content (which may include the default content).
> 
> Indeed; hence the question above. [I also note in passing that
> IETF deals with the Internet in particular, not with "software
> resources generally".]

So? Are you not aware of things like message catalogs, resource bundles, and the like? I give an example to illustrate a small point.
> 
> > In language tag matching one specifies the *least specific* tag 
> that one will accept and won't receive anything less specific 
> (although you might receive something more specific). 
> 
> I'm not sure; if one indicates acceptance of Franglais (en-FR),
> receiving plain en is probably acceptable.  Receipt of en-FR-<Brittany>
> for whatever mechanism is used to indicate the variant of English
> spoken in the region of Brittany (where Breton is a Gaelic language,
> rather than one derived from Latin, like French, or of Germanic root,
> like English) in the country of France, might well be incomprehensible
> to an English-speaking Frenchman from Alsace. [Let's not confuse the
> specific example with the general principle which it illustrates.]

That's the small point I'm illustrating. The draft is very clear about the falsehood of assuming that a more specific tag is mutually intelligible with a less specific one. Your example of Breton is a bad choice of tags, though. Breton has its own ISO 639 code ("bre"). Let's make it better:

I doubt that en-US-boont is fully intelligible to anyone from more than a few miles outside Boonville without a dictionary.
> 
> > The language tag syntax from RFC 3066 itself cannot be changed. 
> draft-langtags carefully adds restrictions to the ABNF and 
> grammar of the tags to ensure that this is so.
> 
> Again, the implementation falls short of the promise.

I grow impatient.
> 
> > Changing the sources for existing subtags or the interpretation 
> of any particular existing language tag is not permitted if we 
> are to maintain backwards compatibility.
> 
> Agreed that there would be a backwards compatibility problem with
> changing the source.  Which is why there is an issue with "CS" being
> defined in the ISO lists by reference as is currently the case with
> RFC 3066, vs. the proposal to change the source to a separate IANA
> registry which handles "CS" specially (i.e. differently from many
> other ISO-derived codes).

Yawn. We have modified draft-09 in an attempt to deal with this issue, but either way we need to deal with 'CS'.
>   
> > To be perfectly blunt: we've worked over a year on this 
> project. If you have specific comments on this draft, with 
> suggestions for improvements, please send those to the list so 
> that they can be viewed by the community and so that Mark and I 
> can address them. Your suggestions for additional changes to the 
> syntax of language tags we find to be incompatible (to the extent 
> that we understand them) with RFC 3066 and our own work on 
> draft-langtags. You will note that draft-langtags can accommodate 
> your requirements using the mechanisms spelled out above and in 
> the draft... so I fail to see what we should change. If you can 
> express that, we'll consider it. Otherwise you are free to do as 
> we did and write your own draft. Internet-Drafts are a volunteer 
> effort and do not write themselves. Neither is there a Star 
> Chamber of people who create them in the dead of night. If you 
> see a need, fill it. I would suggest: wait for draft-langtags to 
> be an RFC and write an extension that does what you want.
> 
> See RFC 2418; specifically section 2.3 and the comment about consensus
> about a wrong design.  See also the RFC 2026 process requirements and
> RFC 2418 procedures; a group which has no charter or equivalent
> document, no written record of meetings, etc. might very well be
> described as "a Star Chamber of people".

There is a list archive. You can see the discussion and the drafts (I maintain all of them online). Discouraging people from participating in the IETF process is, I think, odious.
> 
> One doesn't write "extensions" to BCP RFCs (that's one of the problems
> with the agglomeration of specifications in the current document); a
> BCP is replaced wholesale (although in theory it might be possible to
> have two related BCPs coexist per the details in RFC 2026 section 6.3;
> but that is unlikely, and in any event the current draft does not
> contain the sort of statement required to coexist with RFC 3066).

Read the draft. The word extension is defined there with a specific meaning. I use that meaning above.

The current draft REPLACES RFC 3066. In it there is text that allows for separate RFCs that provide specific extensions (so that the need to revise this document in the future is reduced, contributing to, well, stability of language tags).




More information about the Ietf-languages mailing list