Ietf-languages Digest, Vol 24, Issue 5

Peter Constable petercon at microsoft.com
Sat Dec 11 16:48:10 CET 2004


> -----Original Message-----
> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> bounces at alvestrand.no] On Behalf Of Bruce Lilly


> > I agree with Bruce, that accessibility of ISO 639 and ISO 3166 has not
> > been the issue. Unfortunately, his comments do not indicate what the
> > real issues were.
> 
> My comments are in response to the "New Last Call" made on
> the ietf-announce list.  They are in response to the text which
> accompanied that new last call and the text of
> draft-phillips-langtags-08.txt dated November 2002.  The
> specific claim that accessibility has been a problem was made in
> the text accompanying the new last call

I don't know where the statement accompanying the announcement came from, but given the impression you came away from it with, I don't think it reflects the rationale for the proposed spec as best it could. If you read section 6 of the draft, it clearly indicates that the goals of the revision are to address issues of compatibility, stability, validity and extensibility. Nowhere does it even mention accessibility.

You singled out that one point to comment on as though it were the main factor. Accessibility was not the only reason listed in the announcement, and was not the first reason listed. And, as I've pointed out, was not a reason given in the draft itself.


> > RFC 3066 made reference to ISO 639-1, ISO 639-2 and ISO 3166-1; the
> > proposed replacement adds ISO 15924. I would count that as four ISO
> > standards. Up-to-date code tables for all four are readily available.
> 
> For the purpose of implementation of validation of language-tags,
> the ISO 639 list includes both the 2- and 3-character codes in a
> single document.  The claim (again from text accompanying the
> new last call) states that there is some difference in the draft
> proposal from 3066 in that 3066 (the text alleges) requires
> "lists of codes from five separate external standards" -- in fact
> two lists suffice for 3066 implementations.

Again, I don't know who wrote the text of the announcement, but again it is bringing up an accessibility issue, and mistakes the general intent and also a specific detail: RFC 3066 did not reference five source standards; it only referenced three (which you percieve as two).


> > I think this is a serious misrepresentation of the intent of the
> > proposal: the draft nowhere suggests, let alone declares, that the
> > source ISO standards are irrelevant.
> 
> A poor choice of words on my part. The text and draft suggests
> that only the proposed new registry should be consulted, and
> the draft clearly specifies that the description of all subtags is
> to be provide in English (only).

It is certainly the case that only it should be consulted for determining what sub-tags are valid with what denotation, which was the intent.

 
> > Rather, the intent of the
> > comprehensive registry is to ensure stability...

> It's not clear to me that the proposal will provide protection
> against the whims of politicians.  If the definition of "CS" as
> a country code changes again under the proposed scheme,
> how is one to determine specifically what some archived
> language-tag referred to at some point in time?

By looking in the sub-tag registry. If ISO changed the meaning of "US" to something other than what it is now, its meaning for purposes of use in an IETF language tag would not change, because it would remain stable in the sub-tag registry. You would be fairly well protected against the whim of politicians.



> > and as Bruce quite clearly pointed out, those
> > source standards are readily accessible. So the suggestion that
> > implementers will no longer have access to French-language names from
> > the source ISO standards simply is vacuous.
> 
> But if the proposed new registry's description of "CS" says
> "foo" and the ISO standard code list says "bar", what's
> an implementor supposed to present to a user as *the*
> description associated with "CS"?

The *meaning* of the sub-tag is determined by the sub-tag registry. If you want human-readable descriptors, you already have to look beyond the ISO standards for anything more than English and French; it would not be new that you have to look beyond the registry itself to decide what human-readable descriptors you should provide in a product.



> > As for concerns of Anglo-centricity, I'm sure that the authors had no
> > anti-French motive, and would be open to suggestions as to how that
> > could be addressed.
> 
> One possibility would be two description fields.  But the
> registry would need a charset closer to ISO-8859-1 than
> to ANSI X3.4 as currently specified.  Or an encoding
> scheme.

Personally, I don't see the value in something like that. Given the intent to have a registry that can be machine-readable, changing its charset from ANSI X3.4 in order to gain descriptors in just one more language is not worth it IMO. 

Speaking at least for Microsoft, we're interested in having descriptors in far more than two languages, and we certainly would not blindly base the descriptors we present to our customers solely on what a registry provides, no matter what its charset.



> > > The ABNF in the draft permits all of the following tags which
> > > are not legal per the RFC 3066 ABNF:
> > >    supercalifragilisticexpialidoceus
> > >    y-----
> > >    x1234567890abc
> > >    a123-xyz
> >
> > In fact, none of these is permitted by the ABNF of the draft.
> 
> ABNF from the draft...

> That means that the "grandfathered"
> production (which is an alternative in the Language-Tag
> production) will match any of the following text tags (comments
> to the right separated by a semicolon):
>    x  ; ALPHA followed by zero repetitions
>    xa ; ALPHA followed by one ALPHA (see alphanum)
>    x- ; ALPHA followed by one HYPHEN
>    supercalifragilisticexpialidoceus ; ALPHA followed by many ALPHAs
>        (see alphanum) (example previously given)
>    x1234567890abc ; ALPHA followed by 13 alphanums
>        (as previously given)
>    a123-xyz ; ALPHA followed by three DIGITs (see alphanum)
>        followed by one HYPHEN followed by three ALPHAs
>        (example previously given)
>    y----- ; ALPHA followed by five HYPHENs (example previously
>        given)
> 
> I say the ABNF from draft -08 (quoted above) allows those;
> you say no.

My mistake; I was thinking beyond the ABNF alone to other constraints imposed by the proposed spec.

As you know, the 'grandfathered' production is loose in the ABNF given in the draft, but is very tightly constrained elsewhere in the draft: it is limited to only items registered under RFC 1766 or RFC 3066 up to the date of acceptance of this proposed spec. (In fact, only a subset of those, all explicitly identified in the sub-tag registry.) On the date of acceptance, you will be able to know precisely what the valid tags that fit under the 'grandfathered' production are and will forever be, and it is 100% guaranteed that none of them will have any of the forms that seem to concern you.



Peter Constable
Microsoft Corporation

_______________________________________________
Ietf mailing list
Ietf at ietf.org
https://www1.ietf.org/mailman/listinfo/ietf


More information about the Ietf-languages mailing list