draft-phillips-langtags-08, process, specifications, "stability", and extensions

Thu Dec 30 07:14:36 CET 2004

>  Date: 2004-12-29 17:45
>  From: "Addison Phillips [wM]" <aphillips at webmethods.com>
>  To: ietf-languages at alvestrand.no, ietf at ietf.org
>  Reply to: aphillips at webmethods.com
>  
> Comments below. I must admit that I'm losing the ability to respond to this thread, since it contains direct statements that no response will satisfy the correspondent. 

I'm fairly certain that it does not. It does state that to
date no satisfactory procedural method for handling changes
in meanings of codes has been presented which does not
itself change the meaning of tags which are currently in use.

> The origin of the draft is an individual submission, governed by the various RFCs cited. What's the problem with that? Are individual submissions somehow inappropriate now?

Individual submissions are fine for Informational and
Experimental RFCs, i.e. RFCs which do not purport to be
or to become standards.  Individual submissions can be
part of the Standards Track with AD-level support. It is
possible for an individual submission to become BCP with
the same caveat, however because BCPs go into effect as
standards without the phased roll-in and implementation
experience that characterize the Standards Track, "BCPs
require particular care" (RFC 2026).  They should possess
the characteristics that result from phased roll-in of
Standards Track RFCs; design choices resolved, multiple
independent interoperable implementations, they should
be well-understood and have no known technical omissions.

> This draft does not modify the process of the IETF [...]

IETF process for use of external standards is to reference
those standards as they exist, not to attempt to modify
those standards by declaring bits and pieces invalid in
the absence of transfer of change control from the
originating body.

> Do what you feel is warranted, Bruce. You don't appear to be trying to achieve consensus, which is the touchstone of the IETF process as I understand it. If you feel issues should be taken to the IESG, then do so. 

You have yourself noted that the draft is an individual
submission, not the result of an IETF process. "consensus"
doesn't apply to an individual effort.  IF you want to
adhere to IETF process, by all means ask the IESG to set
up a working group, with a charter, a Chair, etc.; I
fully support that.

> > > This draft defines language tags.
> > 
> > Yes. And a registry format technical specification.  And a matching
> > algorithm technical specification. In addition to the registration
> > process.
> 
> .... just like RFC 3066 did.

RFC 3066 didn't dictate the registry format. The matching
algorithm was much simpler -- indeed the complexity of the
method in the draft under discussion is primarily due to
the addition of orthogonal data as subtags.  Note also that
in the transition from 1766 to 3066, the specification of
the Content-Language field was broken out into a separate
document (RFC 3282).

> > > Other drafts, RFCs, specs, etc. define processes and 
> > applications that use them. The appropriate use of language tags 
> > is the concern of those specifications.
> > 
> > Per RFC 2026, an application having specific requirements for use
> > of Technical Specifications (TS) should provide an Applicability
> > Statement (AS) specifying specific requirement levels for each
> > TS involved...
> 
> The draft provides specific requirements for language tags themselves, which are strings compatible with the RFC 3066 strings already used by the other specifications. The applicability and requirements for this iteration of language tags is the same as it was under RFC 3066. The language tags created do not break existing specifications. The requirements in this document were calibrated to allow all existing RFC 3066 references to remain in force without prejudice. In fact, we did NOT change things that might have otherwise been changed in order to ensure deep compatibility.

The point is that an application, such as IDNA, could specify
use of tags at a certain requirement level, matching at a different
requirement level (or using a different algorithm), and is probably
unconcerned with registration procedure and registry format. An
applicability statement for use of language tags for IDNA could
therefore reference the tag format and matching algorithm(s)' TSs
and need not mention the registration procedure or registry format.
In short, I am clarifying your earlier statement about uses of
technical specifications (viz. that an AS is the mechanism by which
appropriate use of TS is documented).

> Ultimately, the existance of the RFC 3066 language tag registry trumps all of your arguments about this: all of the tags defined in the generative mechanism of RFC 3066bis could have been registered under 3066 (with loss of functionality for the users of those tags, to be sure). The argument that every complete tag used anywhere is trumped by the existance of the generative mechanism in RFC 3066. Registered variant subtags still must have a recommended range to which they apply. Very little has changed, except that using subtags is a bit more logical.

I've reread that several times and can't make sense of it. Could you
please rephrase.

> > > If there is some text that this draft should carry to help 
> > guide implementations, please suggest it so that we can all 
> > consider it.   
> > 
> > It would help immensely if the 3 technical specifications (tag
> > format, registry format, matching algorithm) were separated as
> > separate documents to facilitate reference as independent TSs,
> > and to facilitate any individual extensions/revisions, etc.
> > that may be necessary in the future, and to keep those separate
> > from the registration procedure which itself may need to be
> > separately referenced and/or revised.
> 
> Well there at last is a suggestion. We think splitting the draft up would not be a benefit because the three items are closely linked and have historically been in one document. There is no indication that any of these items will be separately revised in the future. While I'm sure it is possible, I think it would be wiser to keep these items together, since they have historically been together.

So why not then also throw in the closely linked specification of
the Content-Language field, which has historically been in the same
document (RFC 1766)?  I see no substance in your response; it does
not address the issue of how an implementation of an application
could be facilitated (by making an AS easier to produce by providing
separate documents so that requirement levels can be independently
and clearly specified for the different TSs).

> > > No, the revision clearly expands the scope of language 
> > distinctions that can be represented with a language tag--quite 
> > significantly in some cases.
> > 
> > Indeed, and without registration of the tags and the review process
> > associated with that (existing RFC 3066) registration procedure. As
> > Harald Alvestrand pointed out some time ago, that (inappropriately)
> > shifts implementation effort from the tag generator (no registration
> > required) to the recipient (what the heck does this mysterious tag
> > actually *mean*).
> 
> Nonesense. There is the same review process (strengthened somewhat, actually, from experience) for subtags.

RFC 3066 has no review process for subtags. They are what the ISO
lists say they are. It does have a review process for IANA
registered tags as part of that registration procedure, which
(except for private use tags) must be followed before use of a
tag not based on ISO language as a primary tag, and optional
ISO country as a secondary tag.

> Harald's point, I think, is not valid because only the registered (and rarely implemented) tags were subject to scrutiny.

Not so; the ISO language and country codes are certainly subject
to scrutiny (but not to second-guessing and cherry-picking). Under
RFC 3066, a tag may be generated from the standard ISO tag, or it
may be an IANA registered tag (leaving aside private use tags for
the moment).  A parser can easily determine what such a tag is; if
the primary subtag has 2 or 3 letters, it is an ISO language code.
If the second subtag has 2 letters, it is an ISO 3166 country code.
Anything else is either private use (primary subtag is x) or is
registered as a complete IANA tag, or is an error. [de-AT-1901,
incidentally, (as an example) does not meet the RFC 3066 requirement
of 3 to 8 characters in the second subtag for registration with
IANA...].  Under the proposed draft, anybody may legally generate
a tag such as
  sr-Latn-CS-gaulish-boont-guoyu-i-enochian
or
  sr-Latn-CS-gaulish-boont-guoyu-i-enochian-x-foo
with *no* specific registration requirements (i.e. all components
are either registered or require no registration). In the latter
case, a parser can only determine that it contains a private-use
subtag after wading through the other subtags.  In either case,
it is difficult (to say the least) for the recipient or his
software to determine what the generator of that tag intended to
convey.  Returning to the private use issue; in RFC 3066, as in
every other case that I know of where x is used as an indicator
of private use for some name, it is used as a prefix of the name,
never buried deep inside the name (as provided for by the draft
proposal).

> The new draft actually provides a framework in which any subtag's type can be discerned from its position and size, even if the subtag itself is unrecognized: this is actually *better* than you could obtain with the existing registry.

Not quite; in the examples above one cannot determine what "enochian"
is from its size and position alone -- one needs to know that it
follows a single character subtag and that the single character is
not an x.

> The generator *is* required to register non-private use subtags for use, so that statement mystifies me. You can't just use any subtag you feel like (except as private use). The recipient can access the registry to determine the meaning of any subtag (you couldn't do that before).

Surely you're not claiming that each individual generator must
separately register "sr", "Latn", "CS" etc. in order to use
them!?!  A recipient using software that interprets RFC 3066
tags isn't going to be able to do anything useful with any
hypothetical tag which contains a script subtag that would be
produced under the draft rules (if the script subtag were to appear
*after* the region sugtag, one could at least match "sr-CS-Latn"[...]
to "sr-CS", which an RFC 3066 parser could handle. Again returning
to private-use, an RFC 3066 parser can (only) determine that a
private-use tag is in use if it has x as the primary tag. There
are provisions in the draft syntax that break backwards compatibility.

> > What about core Internet protocols such as MIME and the
> > Internet Message format (STD 11)? 
> 
> I could have cited those. The example was not intended as an exhaustive list, eh? Are you suggesting that XML isn't an important technology?
[...]
> So what? We don't like the W3C or something? 

XML isn't an IETF protocol or format. Whether or not it is
"important", for any meaning of that word, is irrelevant. The
point is that given the IETF's limited resources, it
concentrates on Internet technology (see RFC 3935) and it needs
to take (core) Internet protocols into account in IETF
specifications such as RFCs (BCP or otherwise).

> Well you can't have it both ways. Either CS means Czechoslovakia or it means Serbia and Montenegro.

Certainly in language tags "CS" is in use to mean Srbija i
Crna Gora-Srpski.  I haven't seen any documented cases where
it is used (in language tags) to mean Czechoslovakia (but I
haven't started any archelogical digs to try to uncover any).
If there has been no such use, then the brouhaha over the change
is much ado about nothing.  If there has been such use, then
it's clear that interpretation is going to have to be linked to
time of generation of the tag if the semantics are to be
preserved.

> You can see an early version of draft-09 that attempts to address it here:
> 
> http://www.inter-locale.com/ID/draft-phillips-langtags-09.html
> 
> Your comments on that would be appreciated.

For the moment, we're discussing draft-phillips-langtags-08,
on which IESG action is pending (in a week).  There are many
things that the IESG might do when it makes its decision; in
prudence, I'll wait to see what they decide.  IMO, discussing
multiple revisions of a draft through multiple IESG New Last
Calls isn't the most efficient or effective way to make
progress.

> > > We greatly expanded what can be represented in four major ways:
> > > 
> > > 1. Added script subtags for writing system variations.
> > > 2. Mixed generative and private use subtags for private minor 
> > distinctions in tags.
> > > 3. Extensions for really specialized distinctions.
> > > 4. UN M49 region codes, including supra-national regions to 
> > represent geographical distinctions not covered by ISO 3166 or by 
> > instability in same.
> > 
> > It's not entirely clear if some of those items (e.g. script) should
> > be expressed by an orthogonal mechanism rather than embedded in a
> > *language* tag (for that matter, in retrospect, country codes was
> > probably a bad idea).
> 
> There would be no RFC 1766 or 3066 if ISO 639 language codes actually captured all of the nuances of language (doh!).

Well, there was a need for separate registered tags and for
specification of private use tags, so I don't think that's quite
right. It sounds like 639-3 might provide substantially greater
coverage.

> There is a clear need for script codes for distinguishing certain kinds of Chinese written material, as well as certain languages in which there are active script transitions or in which the language is commonly written in more than one script. Individuals not connected with this effort have attempted to register similar language tags recently. It is important to identify the writing system in those cases to many users. 

But none of that applies to an audio file of spoken material,
where script would be superfluous and, as noted above, would
lead to loss of backwards compatibility.  Surely some types
of script is indicated by the charset; in situations where that
is not the case, a separate mechanism could be used for that
orthogonal parameter without breaking compatibility with
existing parsers of language tags.

> > The whole "stability" brouhaha seems to be a tempest in a teapot.
> > Surely the issue could be addressed in a professional manner by
> > reaching an agreement with ISO/UN regarding the issue, as has been
> > done for the case of 2-letter vs. 3-letter codes and stability of
> > existing 3-letter codes.
> 
> It is only *one* of the things addressed by the draft. But it is and remains important. Doug Ewell suggested to me that even if no RA or MA ever reuses a code again, it is still ISO 3166/MA's job is to keep the codes in sync with the current state of the world.  Whenever countries split up, join together, or change names, ISO 3166/MA will be there to change the code list.  The instability is not all the MA's fault, but we still need to protect against it because of legacy data. The lonely CS example should not become the state of affairs going forwards.

Does the ISO not set ground rules for the 3166/MA?  Could it not
specify that codes are not to be reused?

> Matching hasn't actually changed.

I beg to differ. Introduction of a script subtag between language
and country code changes matters considerably, in a manner which
breaks backwards compatibility.

> The existance of multiple mechanisms isn't really an issue. The draft specifies ONE mechanism, just like RFC 3066, and notes that more specialized processing is possible.

It's an issue that calls for a separate specification to facilitate
reference (by an AS) to the mechanism or mechanisms which are
applicable, at their respective requirement levels, without
confusion about what specification is being referenced.

> > > If one specifies "en-FR", then one should not expect to receive 
> > anything less specific than "en-FR".
> > 
> > Are you referring to use in Accept-Language fields or in Content-
> > Language fields (or equivalent accept/send dichotomy)?
> 
> Yes and no. Accept/Content is one example of matching. Another might be a query on a document (as with XQuery on an XML document, for example). The remove-from-right matching rules in RFC 3066 (and the draft) have long had this particular design.
> > 
> > > In software resources generally one specifies the *most 
> > specific* (granular) tag that one will accept and may receive 
> > less specific content (which may include the default content).
> > 
> > Indeed; hence the question above. [I also note in passing that
> > IETF deals with the Internet in particular, not with "software
> > resources generally".]
> 
> So?

Do you not see the contradiction between "one should not expect to
receive anything less specific" vs. "may receive less specific
content"?

> Are you not aware of things like message catalogs, resource bundles, and the like?

I'm aware of many things. But as noted, the IETF has limited
resources, and concentrates on Internet issues; it does not have
delusions of being able to solve all of the world's problems.

> > > In language tag matching one specifies the *least specific* tag 
> > that one will accept and won't receive anything less specific 
> > (although you might receive something more specific). 
> > 
> > I'm not sure; if one indicates acceptance of Franglais (en-FR),
> > receiving plain en is probably acceptable.  Receipt of en-FR-<Brittany>
> > for whatever mechanism is used to indicate the variant of English
> > spoken in the region of Brittany (where Breton is a Gaelic language,
> > rather than one derived from Latin, like French, or of Germanic root,
> > like English) in the country of France, might well be incomprehensible
> > to an English-speaking Frenchman from Alsace. [Let's not confuse the
> > specific example with the general principle which it illustrates.]
> 
> That's the small point I'm illustrating.

But in response to JFC, you specifically said that "one should not
expect to receive anything less specific". It seems to me that
receipt of less specific (i.e. more general) is OK.

> Your example of Breton is a bad choice of tags, though. Breton has its own ISO 639 code ("bre").

But the tag refers to a dialect of English spoken (as a second
language) by a Breton, not to the Breton language per se (and
in a cursory look, I didn't see a UN M49 region code for
Brittany).

> I doubt that en-US-boont is fully intelligible to anyone from more than a few miles outside Boonville without a dictionary.

Fine, but that isn't representative of the situation that JFC
posed.  The representative question would be "does a resident
of Boonville, who speaks en-US-boont, understand en-US?".

> > > Changing the sources for existing subtags or the interpretation 
> > of any particular existing language tag is not permitted if we 
> > are to maintain backwards compatibility.
> > 
> > Agreed that there would be a backwards compatibility problem with
> > changing the source.  Which is why there is an issue with "CS" being
> > defined in the ISO lists by reference as is currently the case with
> > RFC 3066, vs. the proposal to change the source to a separate IANA
> > registry which handles "CS" specially (i.e. differently from many
> > other ISO-derived codes).
> 
> Yawn.

Please see RFC 2026 sections 7.1, 7.1.1, 7.1.3, and 10.1.
Note that RFC 3066 strictly complies with those sections, while
the draft under discussion, by cherry-picking from ISO lists
for which change control has not been transferred to the IESG,
does not.

> > > To be perfectly blunt: we've worked over a year on this 
> > project. If you have specific comments on this draft, with 
> > suggestions for improvements, please send those to the list so 
> > that they can be viewed by the community and so that Mark and I 
> > can address them. Your suggestions for additional changes to the 
> > syntax of language tags we find to be incompatible (to the extent 
> > that we understand them) with RFC 3066 and our own work on 
> > draft-langtags. You will note that draft-langtags can accommodate 
> > your requirements using the mechanisms spelled out above and in 
> > the draft... so I fail to see what we should change. If you can 
> > express that, we'll consider it. Otherwise you are free to do as 
> > we did and write your own draft. Internet-Drafts are a volunteer 
> > effort and do not write themselves. Neither is there a Star 
> > Chamber of people who create them in the dead of night. If you 
> > see a need, fill it. I would suggest: wait for draft-langtags to 
> > be an RFC and write an extension that does what you want.
> > 
> > See RFC 2418; specifically section 2.3 and the comment about consensus
> > about a wrong design.  See also the RFC 2026 process requirements and
> > RFC 2418 procedures; a group which has no charter or equivalent
> > document, no written record of meetings, etc. might very well be
> > described as "a Star Chamber of people".
> 
> There is a list archive. You can see the discussion and the drafts (I maintain all of them online).

That addresses only one of the issues. It does not address the issue
of a charter, of conflict resolution procedures, minutes of face-to-
face meetings, etc. (and the list was established for a purpose other
than work on an RFC).

> Discouraging people from participating in the IETF process is, I think, odious. 

Agreed.  But the activity on the ietf-languages list regarding the
draft under discussion isn't an IETF process -- there is no WG or
Chair, no charter, etc.  Like the fictional Topsy, it jes' growed up.

> The current draft REPLACES RFC 3066.

Drafts don't replace RFCs.