Return-Path: Received: from eikenes.alvestrand.no ([unix socket]) by eikenes.alvestrand.no (Cyrus v2.1.11-Mandrake-RPM-2.1.11-1mdk) with LMTP; Mon, 17 Jan 2005 03:59:46 +0100 X-Sieve: CMU Sieve 2.2 Return-Path: Received: from localhost (localhost.localdomain [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id D864261B90; Mon, 17 Jan 2005 03:59:45 +0100 (CET) Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 13082-05; Mon, 17 Jan 2005 03:59:45 +0100 (CET) Received: from eikenes.alvestrand.no (localhost.localdomain [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id 7DE5961BDB; Mon, 17 Jan 2005 03:59:39 +0100 (CET) X-Original-To: ietf-languages@alvestrand.no Delivered-To: ietf-languages@alvestrand.no Received: from localhost (localhost.localdomain [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id B219361BAF for ; Mon, 17 Jan 2005 03:59:37 +0100 (CET) Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 12995-06 for ; Mon, 17 Jan 2005 03:59:35 +0100 (CET) Received: from montage.altserver.com (montage.altserver.com [63.247.74.122]) by eikenes.alvestrand.no (Postfix) with ESMTP id B7BF461B90 for ; Mon, 17 Jan 2005 03:59:34 +0100 (CET) Received: from lns-p19-4-idf-82-65-255-25.adsl.proxad.net ([82.65.255.25] helo=jfc.afrac.org) by montage.altserver.com with esmtpa (Exim 4.43) id 1CqN74-00027W-97; Sun, 16 Jan 2005 18:59:33 -0800 Message-Id: <6.1.2.0.2.20050116230516.033c4d00@mail.jefsey.com> X-Sender: jefsey+jefsey.com@mail.jefsey.com X-Mailer: QUALCOMM Windows Eudora Version 6.1.2.0 Date: Mon, 17 Jan 2005 02:15:08 +0100 To: ietf-languages@alvestrand.no From: "JFC (Jefsey) Morfin" Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed; x-avg-checked=avg-ok-3B427364 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - montage.altserver.com X-AntiAbuse: Original Domain - alvestrand.no X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - jefsey.com X-Source: X-Source-Args: X-Source-Dir: X-Virus-Scanned: by amavisd-new at alvestrand.no Subject: language tag structure X-BeenThere: ietf-languages@alvestrand.no X-Mailman-Version: 2.1.5 Precedence: list List-Id: IETF Language tag discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ietf-languages-bounces@alvestrand.no Errors-To: ietf-languages-bounces@alvestrand.no X-Virus-Scanned: by amavisd-new at alvestrand.no The debate over the revision of RFC 3066 lead to confusion because of the rigidity imposed by the Internet standard process and its BCP status. This is because RFC 3066 covered various issues which today have widen too much to fit into one single document by specialists of only one single application. I am not interested in addressing this IESG problem. It also shown that the Internet, IT industry, applications and application areas WG need a stable and consensual tagging system. This is a specific need which should be addressed independently in addressing all the "customers" needs (XML, IRI, DNS, OPES, etc.). My interest as a CRC project developer (common reference center) is a consistent, flexible and open enough tagging structure which permits to consistently document the largest number of human related aspects. I am interested in the critics to the following step by step approach. Attention: it does not want to write a Draft here, just to work out the basis of a common approach to a common consistent solution to different needs. 1. the language tag is to concatenate 5 sub-tags about: - the language - the scripting - the geographical area of use - the style - the authoritative source/reference example : Microsoft Word for French of France I have on my PC right now. - the language is French - the scripting is Latin - the geographical area are France or Belgium or Luxembourg or Canada or Monaco or Switzerland (many other countries should show up) - the authoritative source/reference is Microsoft (and they miss a _lot_ of words) - the style options can be personal, official, etc. 2. only the language information is mandatory in a tag, all the other information are optional or not depending on the application. 3. the langtag information MUST be independent from the matching algorithm. Its role is to support a complete definition of the language by its authoritative source. There may be an unlimited number of authoritative sources. 3. there can be different presentation formats of the langtags. fr-FR (RFC 3066) fr-Latn-FR (RFC 3066 bis) fr-Latn-FR-A-Microsoft latn.fra.fr.a.microsoft for multilingual considerations, characters used in tags will be considered as "0-.z" numerics (lower or upper cases). a "0-.9" numeric version should be supported; 4. default code tables used in the tag would be: ISO 639 for languages ISO 15924 for scriptings RFC 1591 for countries (ISO 3166 2 letter code as approved by ICANN/ITU as a reflect of the real world through the GAC. This approach only gives a better response time to real world adaptation since through their ISO, ITU and GAC Membership interested Govs will find the best solution). Region can be UN or an ad6hoc table A style list should be created. It might be 1 character list (with optional sub entries)? Authority should be registered through a mailbox name on the mailing list of a dedicated Multilingual Task Force. Authority may include a year/publication nr subfield information when a same authority has published several references. Authority may be also identified as its registration number. Extensions to these tables will be accepted to address an extended and homogenous vision of languages and the support of additional related table to form an homogeneous cultural ontology (CulturaMundi) also supporting machine languages. Subfields should be supported for dialect, particular forms of scriptings (I can think of several French scriptings and phonetics), areas, styles and vocal accents. Other codings could be supported with a prefix. The target is to support the largest number of declared or private ID tables. 5. ISO 7000 oriented ICONs I have not been able to find today a free ISO 7000 document part which would give enough information. I suppose that the best solution would be a icon of special shape (easily identified as a language tag: a book?) with a color code to indicate the style, encapsulating the location flag and marked with the language ISO 639 code in the corresponding scripting. A face could indicate the type of voice when the document is vocal (identified by a loud-speaker instead of a book?) 6. Registrations and Publication Some applications calls for a registration, real life lives with descriptions. Nothing opposes that a CRC (IANA or other(s) - like manufacturer reference centers) register in toto or in parte their cultural matrix, to serve as an application reference or to support special requirements. The registration made at the address of a give tag is only for the benefit of the application users and not for structural reference for developers. This means that a IANA or an other CRC particular registration at fr-Latn-FR SHOULD have no influence on the way an application software could be designed. I intend to use that tagging standard in the INTFILE reporting on the DNS top level status. This file is to support information on a per ccTLD basis. For example the style "D" could be for IDN tables. 7. IRI Before presenting an information Draft that would make the lang/culturetags consistent with other taggings (DNS, keywords, access engines, OPES triggering, search engines, etc.) I suppose the best place for core consistence of the whole set of documents under way is Martin Duerst's IRI Draft. And all these documents refer to RFC 3066. Martin, I have carefully read your IRI draft (10.txt ?) several times. I am not sure I understand everything. This is certainly due to my low IQ. But also because some definitions seems to be missing. In particular I have not been able to understand exactly what you name a "name-reg" and therefore to determine if the proposed IRI format fully supports ML.ML domain names (unicode.unicode or xn--.wn-- that some countries and private network may want or already have implemented)? I am not sure either if Upper/Lower case differentiation can be fully supported should the name-reg had always to support them (punnycode is supposed to be able to support them - discussed for the LHS support - Upper cases should be supported everywhere in the IRI even when not used by the DNS). I thank you for your comments. jfc _______________________________________________ Ietf-languages mailing list Ietf-languages@alvestrand.no http://www.alvestrand.no/mailman/listinfo/ietf-languages