Ietf-languages Digest, Vol 24, Issue 5

Sat Dec 11 17:59:32 CET 2004

Gentlemen,
I see several points discussed here which are/are not of the same order and 
seem confusing the issue.

1. the discussion creeps from Harald's RFC 3066 to Multilingual Internet. 
It seems strange to discuss byte oriented details without having first a 
Multilingual framework telling what is the scope of the discussion and its 
implications (which are certainly major) on the whole Internet 
architecture. I submit that an IAB guidance is first necessary. Before 
going any further a true WG-Multilingualism should be created and open to 
everyone (a private IETF-Language lists should be an interim situation 
towards such a WG)

2. I see quoted "RFC 3066bis" as a document. The RFC Editor seems to ignore 
that RFC? Where can I find it?

3. there are at least four different levels:

- what is Multilingualism vs. vernacularism (there are 6000 human languages 
but a standard should be able to support non scripted and computer 
generated and past languages, what may lead to millions of references).

- vernacular granularity has nothing to do with geography and countries. 
The way this inserts into the general digital convergence (is the IANA the 
proper register?). The time relation to other standards which calls for a 
kind of "time hierarchy" : a cross standard consistent rule on the way to 
support permanence.

- the tag's semantic itself.  This seems manageable from the vairous 
exchanges, but calls for a unique comprehensive and maintained  document 
for parsers (programs and people having to sort language issues), what is 
different from the RFC culture.

- the multilingual (not uni/bilingual) tag description which is necessary 
for the different languages accompanying culture/databases to identify the 
same language.

IMHO the sub-tag granularity is to match the real life and research 
granularity. This does not mean that each tag must be registered but that 
each sovereign, authoritative or historical language/cultural oriented 
source must be able to register its own sub-tag and to mutually reduce 
possible conflicts among themselves (the trade-off is between a standard 
which would conflict with reality, and a reality conflict resilient 
flexible standard). The same as the IANA is not in the business of defining 
countries (Jon Postel, RFC 1591) it should not be in the business of 
defining languages.

I also submit that IANA is not the proper place anymore to support such a 
Register. Experience shown that IANA (now a function of ICANN) is subject 
to controversies in this or in parallel real life areas: ccTLD delegation, 
ccTLD entries in the root file, accepted MINC reaction to the Polish non 
concerted introduction of Arabic, Russian and Hebraic tables, ICANN 
strategy for internationalized rather than multilingual TLDs, etc. I also 
submit that UNESCO, MPEG or other standard/cultural organizations involved 
in the daily reality (universities, editors, posts, governments, 
copyrights, WIPO, etc. etc.) are more concerned and may make their own 
standard prevail after an unnecessary and harassing dispute. It seems that 
any semantic able to support open sub-tags whatever they originate from, is 
useful. Going any further would push in favor of a less and less 
[unilingual or internationalized] network centric market against a 
market  evolution toward user centric [multilingual/multiulcural] networked 
relations [P2P, VoIP, NAT, coreboxes, OPES, etc.].

One of the possibilities of the current IETF administrative reform could be 
to give the IETF a structure that could be funded to permit it  to 
participate as such into harmonizations negotiations. Due to the importance 
of the matter it will most probably be addressed at the WSIS level, since 
it is one of its priority. I therefore submit that we forget byte oriented 
details for the time being, keep this proposition as a draft and use it to 
open a dialog with the WGIG over who should to what with who. We can 
certainly wait one year more to have a globally accepted approach which 
will save every one a huge amount of time and money.

This dialog is not easy as we have no direct IETF/IAB/IESG representative 
there (except Avri who is multilingual, but representing the Civil 
Society). But there are enough WGIG members interested in Multilingualism 
and technically competent to understand, comment and progress on this file. 
These posts and the following URL should given them a comprehensive 
understanding of what is at stake.

Draft author's comments: why the draft.
http://www1.ietf.org/mail-archive/web/ietf-announce/current/msg00755.html

Text of the Draft:
http://www.ietf.org/internet-drafts/draft-phillips-langtags-08.txt

Text of the RFC 3066
http://ietf.org/rfc/rfc3066.txt?number=3066

If there are other links to present, I am interested in collecting them and 
in publishing them on various Multilingualism oriented sites.
Thank you.
Jefsey Morfin

On 01:54 11/12/2004, Bruce Lilly said:

> > RE: New Last Call: 'Tags for Identifying Languages' to BCP
> >  Date: 2004-12-10 16:37
> >  From: "Peter Constable" <petercon at microsoft.com>
> >  To: ietf-languages at alvestrand.no
> >
> > Bruce Lilly's message makes several inaccurate statements against the
> > proposed draft, and misrepresents some of the changes being made. My
> > main concern is that I don't know where it was circulated. I might be
> > wrong, but I get the impression it was written with a different audience
> > in mind and then copied here.
> >
> >
> >
> > > -----Original Message-----
> >
> > > > There are problems with the the RFC 3066 definition of generative
> > tags,
> > > > however. The ISO 639 and ISO 3166 standards are not freely available
> > and evolve
> > > > over time.
> > >
> > > Accessibility has not been a problem for this implementor...
> >
> > I agree with Bruce, that accessibility of ISO 639 and ISO 3166 has not
> > been the issue. Unfortunately, his comments do not indicate what the
> > real issues were.
>
>My comments are in response to the "New Last Call" made on
>the ietf-announce list.  They are in response to the text which
>accompanied that new last call and the text of
>draft-phillips-langtags-08.txt dated November 2002.  The
>specific claim that accessibility has been a problem was made in
>the text accompanying the new last call (q.v.).  For those not
>subscribed to the ietf-announce list, the text of the new last
>call can be seen at
>http://www1.ietf.org/mail-archive/web/ietf-announce/current/msg00755.html
>
>
> > > > The largest change in the specification is that it modifies the
> > structure of
> > > > the language tag registry. Instead of having to obtain lists of
> > codes from five
> > > > separate external standards...
> >
> > > Contrary to the implicit claim, the ISO documents mentioned
> > > above comprise two standards (available in two languages each),
> > > not "five separate external standards".
> >
> > RFC 3066 made reference to ISO 639-1, ISO 639-2 and ISO 3166-1; the
> > proposed replacement adds ISO 15924. I would count that as four ISO
> > standards. Up-to-date code tables for all four are readily available.
>
>For the purpose of implementation of validation of language-tags,
>the ISO 639 list includes both the 2- and 3-character codes in a
>single document.  The claim (again from text accompanying the
>new last call) states that there is some difference in the draft
>proposal from 3066 in that 3066 (the text alleges) requires
>"lists of codes from five separate external standards" -- in fact
>two lists suffice for 3066 implementations.
>Â
> > I think this is a serious misrepresentation of the intent of the
> > proposal: the draft nowhere suggests, let alone declares, that the
> > source ISO standards are irrelevant.
>
>A poor choice of words on my part. The text and draft suggests
>that only the proposed new registry should be consulted, and
>the draft clearly specifies that the description of all subtags is
>to be provide in English (only).
>
> > Rather, the intent of the
> > comprehensive registry is to ensure stability in IETF implementations by
> > protecting them from unpredictable changes in ISO standards, such as the
> > re-definition of "CS" as a country identifier not long ago.The
> > denotation of identifiers listed in the registry is based on their
> > definition in the ISO standards, not on an informative descriptor
> > provided in the registry;
>
>It's not clear to me that the proposal will provide protection
>against the whims of politicians.  If the definition of "CS" as
>a country code changes again under the proposed scheme,
>how is one to determine specifically what some archived
>language-tag referred to at some point in time?  I'm not
>particularly concerned about that problem, as I am resigned
>to instability associated with anything specified by politicians
>(and that includes the UN region codes).
>
> > and as Bruce quite clearly pointed out, those
> > source standards are readily accessible. So the suggestion that
> > implementers will no longer have access to French-language names from
> > the source ISO standards simply is vacuous.
>
>But if the proposed new registry's description of "CS" says
>"foo" and the ISO standard code list says "bar", what's
>an implementor supposed to present to a user as *the*
>description associated with "CS"?
>
> > As for concerns of Anglo-centricity, I'm sure that the authors had no
> > anti-French motive, and would be open to suggestions as to how that
> > could be addressed.
>
>One possibility would be two description fields.  But the
>registry would need a charset closer to ISO-8859-1 than
>to ANSI X3.4 as currently specified.  Or an encoding
>scheme.
>
> > Surely, though, this is not a technical argument
> > against the proposal.
>
>Not purely technical, though it presents problems for
>existing implementors who provide bilingual support.
>Eliminating bilingual descriptions for the language,
>country (and UN region) codes leaves implementors
>in a quandary.
>
> > > The ABNF in the draft permits all of the following tags which
> > > are not legal per the RFC 3066 ABNF:
> > > Â  Â supercalifragilisticexpialidoceus
> > > Â  Â y-----
> > > Â  Â x1234567890abc
> > > Â  Â a123-xyz
> >
> > In fact, none of these is permitted by the ABNF of the draft.
>
>ABNF from the draft:
>
>    Language-Tag = (lang
>                    *("-" extlang)
>                    ["-" script]
>                    ["-" region]
>                    *("-" variant)
>                    *("-" extension)
>                    ["-" privateuse])
>                    / privateuse         ; private-use tag
>                    / grandfathered      ; grandfathered registrations
>
>    lang            = 2*3ALPHA           ; shortest ISO 639 code
>                    / registered-lang
>    extlang         = 3ALPHA             ; reserved for future use
>    script          = 4ALPHA             ; ISO 15924 code
>    region          = 2ALPHA             ; ISO 3166 code
>                    / 3DIGIT             ; UN country number
>    variant         = ALPHA (4*7alphanum) ; registered variants
>                    / DIGIT (3*7alphanum)
>    extension       = singleton 1*("-" (2*8alphanum)) ; extension subtag(s)
>    privateuse      = "x" 1*("-" (1*8alphanum))       ; private use subtag(s)
>    singleton       = ALPHA             ; single letters
>                                        ; (except x, which has special 
> meaning)
>    registered-lang = 4*8ALPHA           ; registered language subtag
>    grandfathered   = ALPHA *(alphanum / "-")  ; grandfathered registration
>    alphanum        = (ALPHA / DIGIT)    ; letters and numbers
>
>Note that the RFC 2234 definition of an asterisk in front of
>a production (with no adjacent numbers, as is the case in
>the "grandfathered" production) means zero or more
>repetitions (without upper bound) of the production to the
>right of the asterisk. That means that the "grandfathered"
>production (which is an alternative in the Language-Tag
>production) will match any of the following text tags (comments
>to the right separated by a semicolon):
>    x  ; ALPHA followed by zero repetitions
>    xa ; ALPHA followed by one ALPHA (see alphanum)
>    x- ; ALPHA followed by one HYPHEN
>Â  Â supercalifragilisticexpialidoceus ; ALPHA followed by many ALPHAs
>        (see alphanum) (example previously given)
>    x1234567890abc ; ALPHA followed by 13 alphanums
>        (as previously given)
>    a123-xyz ; ALPHA followed by three DIGITs (see alphanum)
>        followed by one HYPHEN followed by three ALPHAs
>        (example previously given)
>    y----- ; ALPHA followed by five HYPHENs (example previously
>        given)
>
>I say the ABNF from draft -08 (quoted above) allows those;
>you say no.  Either you're looking at different ABNF or one
>or more of us doesn't understand ABNF.  If you wish to
>convince me that I don't understand it, you'll have to do
>better than simply claiming that I'm wrong with no supporting
>reasoning.
>
> > > Specifically, the draft allows, and RFC 3066 disallows:
> > > Â  Â subtags more than 8 octets in length
> >
> > This is incorrect. It was true of an earlier draft, but that was
> > changed.
>
>The "new last call" was for version -08; I downloaded it
>from the URI in the new last call and copied the ABNF
>above from that.  My analysis is above.  I await your
>rebuttal or retraction.
>
> > > Â  Â hyphens which do not separate subtags
> > > Â  Â zero-length subtags
> >
> > These near-equivalent statements are incorrect. No hyphen may be
> > permitted without a non-initial sub-tag, and no sub-tag can be an empty
> > string.
>
>See the "y-----" example above, based on the published
>ABNF. Again, I await your rebuttal or retraction.
>
> > > Â  Â primary tags which are not purely alphabetic
> >
> > This is incorrect. A primary sub-tag must be 2*3ALPHA or 4*8ALPHA, or
> > "i" or "x".
>
>See the "a123-xyz" example above (in RFC 3066 parlance,
>the "a123" part is the primary tag, which clearly contains
>DIGITs.  One more time, I await your rebuttal or
>retraction.
>
>_______________________________________________
>Ietf mailing list
>Ietf at ietf.org
>https://www1.ietf.org/mailman/listinfo/ietf