Language Identifier List Comments, updated

Mon Dec 27 17:40:57 CET 2004

At 09:20 27/12/2004, Doug Ewell wrote:
>JFC (Jefsey) Morfin <jefsey at jefsey dot com> wrote:
> > I gave some thinking to all this and reviewed the documents that W3C
> > also prepare. I am afraid we want to put too many unrelated things
> > into the same debate, due to a confusion between the three
> > internationalization, multilingualization and vernacularization layers
> > wich are not identifed and documented yet, while some attempt to
> > discuss what belongs to lingual authoritative sources.
>
>Unfortunately, Jefsey is talking past me again, but I think there may be
>some confusion between the draft (RFC 3066bis) and Tex's page.

Doug,
the internet is usually accepted within the IETF as the adherence to the 
documents resulting from the Internet standard process. What is discussed 
in here is a review of the BCP 47 ( 
http://www.inter-locale.com/ID/draft-phillips-langtags-08.html ). The 
discussion of Tex's page is not a problem, what is a problem is that its 
discussion (please consider your own comment below) does not make 
difference between what is private suggestion and what is authoritative in 
an Internet RFC context.

>Tex is putting together an informative Web page, attempting to identify
>which language tags can reasonably be considered "complete" by
>themselves and which need to be qualified by a region subtag.  For
>example, according to the page, "ca" for Catalan is enough information;
>there is no reason to qualify it with a region subtag, as "ca-ES",
>because Catalan is Catalan regardless of where spoken, or because it is
>only spoken in Spain.  On the other hand, "es" needs to be qualified as
>"es-ES", "es-MX", "es-AR", or whatever, because those variants of
>Spanish differ.
>
>That's it.  Tex is not trying to create an IETF or W3C document, or
>express anything normative.  So we need to be careful to specify which
>document we are commenting on.
>
> > As for naming, languages are chosen and documented by the local
> > internet communities, represented by their Trustees, the ccTLD
> > Managers (the SLD Manager for privately defined tags). The same as
> > IANA is not in the business of defining countries (RFC 1591), IANA is
> > not in the business of defining the languages of the countries.
>
>NOBODY is trying to make IANA do this.  This has been said before, and
>apparently needs to be said again.

Please reread what you just wrote. "there is no reason to qualify [Catalan] 
with a region subtag": only an acknowledged Catalan language authority can 
say that. etc. You are in the process of defining the language of the 
countries. What IETF can only do is to say "if there is a need to qualify 
Catalan this is the way to do it".

  RFC 3066bis defines *codes* for languages and for "regions," which are
>not even necessarily countries (most of the U.N.-based numeric codes are
>for geographic regions such as "Europe" or "Western Africa").  The codes
>come from ISO and U.N. sources, and none are invented by any agent of
>IANA or designee of RFC 3066bis except to resolve conflicts created by
>those sources.  The RFC never defines the *entities* associated with
>these codes, and it is very clear about where the definitions do come
>from (generally the U.N.).

The RFC 1591/3490 define that authorities, and IANA acknowledges them, for 
the only Internet governed language related issue which is IDNA (this is 
discussed below).

>RFC 3066bis also does not make any attempt to define which languages are
>(or should be) used in any given country or region.

This is a final discussion of the Phillips-08 Draft, you call "RFC 
3066bis". This discussion only shows that its attempt to define the tags of 
these languages does not clarify enough its confusions. What is to be 
discussed is what is unclear in relation to other issues under Internet 
standard process.

The IDNA use the BCP47 (ie. RFC 3066 which is discussed here) to define the 
IDN language table.

Please review:
http://www.iana.org/assignments/idn/

Phillips-08 draft ABNF is similar but conflicts with the ubmission template 
of this IANA procedure which is not in ABNF:
http://www.iana.org/assignments/idn/registry-language-template.txt

If you want to understand the typical conflicts, please take the NASK 
submission of the Arab used in Poland,
http://www.iana.org/assignments/idn/pl-arabic.html
and the description of the Japanese to be used in Japan for registration as 
submitted by JPNIC:
http://www.iana.org/assignments/idn/jp-japanese.html .
You will note that the Polish entry (much controverted submission due to 
its linguistic and political implication, resulting into a letter of 
apologies by the NASK) is only a character set list. While the Japanese 
submission quotes a Japanese text 
http://jprs.jp/doc/rule/saisoku-1-wideusejp.html
as intrinsic part of a IANA document (the document cannot really be fully 
understood without understanding this text). This defines a Japanese 
language (the language used in that document).

We have to understand that there are three levels of language usage as per 
the Internet standard process or as introduced by this IANA procedure and 
which are candidate to the BCP47 tagging.

1. the various language related protocol issues (like for example "From:" 
in e-mail).

2. the "internationalized domain names in applications", which are an 
ambiguous issue since "in applications" means in the DNS, in text links, in 
e-mail addresses, ftp access, etc.

3. the way to support (not to handle) the transported content. An end to 
end consistent transport is in the scope of an RFC, not the  organization 
or the meaning of the content. This still leaves to debate the organization 
and the meaning of the associated metadata. You seem to say that language 
tags which are meaningless to you are irrelevant, while I say this is not 
our cup of tea and we are not to care if a language tag is absurd, except 
to document an escape procedure if the user finds it absurd for _him_. I 
submit that we only have to provide an ABNF (what the draft does) and that 
escape procedure (it does not document). I also add - but I fully agree 
that this is for an "RFC 3066ter" - that we should also provide the 
neuronal algorithm to manage them, I mean by this a way to store and to 
keep consistent their information, whatever the considered granularity.

Now, we have to understand what the Internet standard process is trying to 
achieve. It tries to provide a consistent support to languages. This is a 
multilingual internet. I tentatively work on a draft documenting the way 
the Internet standard process can support its deifnition process. This is a 
matter the IETF has never discussed (this mailing list is not an IETF WG, 
RFC 3869 does not even aludes to multilingualism). However there are some 
experience of the complexity at hand. The most interesting I know is the 
LHS (e-mail left-hand side) support and the question to make it consistent 
or not with the RHS (right hand side, which is supported by the IDNA). Due 
to the possibilities of the LHS, there are not so many differences with 
plain text and therefore with full language support.

>As Tex said, language tags are used for much more than IDNA.  And once
>again, I fail to see what Unicode has to do with any of this.

The Internet standard process is about end to end interoperability, not 
about brain to brain interintelligility. However sucj an 
interintelligbility calls for a character level interopeability. The se are 
the same characters for texts, documents, LHS and RHS. In a network, 
consistency is of the essence, so it is likely that what is to be done for 
RHS, LHS, protocols, documents etc. is to be consistent or that we will 
meet extensive conflicts due to the importance and the complexity of the issue.

> > I fully understand that most of the ccTLD Managers have not published
> > language tables and that other applications than DNS call for an
> > immediate support, alaso that SLD Manager may need off-the-shelves
> > tables. However this support by non-ccTLD Managers can only be
> > temporary and MUST be eventually consistent with the ccTLD Manager
> > tables such an RFC should call for. Otherwise we have a real layer and
> > autority violation, all the more than this is not only by RFC 1591,
> > ICANN ICP-1 but also by the WSIS 2003 Resolutions underlinging the
> > sovereignty of Govs over ccTLDs. There is no problem in documenting
> > the duties of a ccTLD Manager in this area and in discussing it with
> > ccTLDs Managers, as an addition to the ccTLD Manager BPs.
>
>This is way out of scope for RFC 3066bis or any of its predecessors.

No. This is consistent with and affects current IANA procedures regarding 
IANA tables named after the document you discuss the update.

> > I would therefore review the ABNF in four areas:
> > - favoring the three letter codes for the language to make this entry
> > time independent and consistent (this does not change anything in the
> > currenet applications)
>
>No.  RFC 3066bis is not going to break interoperability with RFC 3066 by
>switching to alpha-3 language codes for all languages, forcing users to
>replace "en" with "eng", "fr" with "fra", and so forth.  This is simply
>not going to happen.

I am afraid this is not what I say, and tis is what is going to happen.

I never said that anything should be "forced", but that 2 alpha overlaps 
the ccTLD list creating user's confusion. There is a need for a simple 
formatted contextual cultural definition. It cannot be 2 and 3 alphas. It 
has to be 2 + "*" or 3. It is likely that most of the new usages will 
stabilize using 3 letters (over 7250 3 letters tags, a few 2 letters tags 
will be odd and resource consuming in new applications).

 > I would add a paragraph indicating that languages tags actually
> > designate the language tables decided by the local internet
> > communities through they ccTLD Manager. Underlining that there is as
> > many language tags in use as supported, requested or prepared tables.
>
>I have no idea what any of this refers to.

This refers to the IANA existing format.
http://www.iana.org/assignments/idn/registry-language-template.txt
And to real life needs, such as icons.

I can only recommend you to read the part 9.6 of the following ETSI document.
http://portal.etsi.org/stfs/documents/STF231/eg_202132v010101p.pdf

I submit that the RFC 3066bis ABNF should be checked against that 
recommendations.

I suggest that the best way to support its R.9.6.f recommendation is a set 
of ICONs following ISO 7000 recommendations, and permitting to propose 
menus presenting a "glyph + text in language/character set" list.

All the best.
jfc