Return-Path: Received: from murder ([unix socket]) by eikenes.alvestrand.no (Cyrus v2.2.8-Mandrake-RPM-2.2.8-4.2.101mdk) with LMTPA; Sun, 27 Mar 2005 04:58:14 +0200 X-Sieve: CMU Sieve 2.2 Received: from localhost (localhost.localdomain [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id 5D9F561B66 for ; Sun, 27 Mar 2005 04:58:14 +0200 (CEST) Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 23493-06 for ; Sun, 27 Mar 2005 04:58:04 +0200 (CEST) Received: from megatron.ietf.org (megatron.ietf.org [132.151.6.71]) by eikenes.alvestrand.no (Postfix) with ESMTP id C9FD961B01 for ; Sun, 27 Mar 2005 04:58:03 +0200 (CEST) Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DFNxQ-000743-P3; Sat, 26 Mar 2005 21:56:56 -0500 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DFNxO-00073V-Sg for ltru@megatron.ietf.org; Sat, 26 Mar 2005 21:56:54 -0500 Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id VAA26226 for ; Sat, 26 Mar 2005 21:56:53 -0500 (EST) Received: from [63.247.76.194] (helo=montage.altserver.com) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DFO3b-0006lX-SZ for ltru@ietf.org; Sat, 26 Mar 2005 22:03:20 -0500 Received: from lns-p19-8-idf-82-65-71-22.adsl.proxad.net ([82.65.71.22] helo=jfc.afrac.org) by montage.altserver.com with esmtpa (Exim 4.44) id 1DFNxG-0007ZV-0v; Sat, 26 Mar 2005 18:56:46 -0800 Message-Id: <6.1.2.0.2.20050327035629.043a0db0@mail.jefsey.com> X-Sender: jefsey+jefsey.com@mail.jefsey.com X-Mailer: QUALCOMM Windows Eudora Version 6.1.2.0 Date: Sun, 27 Mar 2005 04:47:54 +0200 To: "Doug Ewell" , "LTRU Working Group" From: "JFC (Jefsey) Morfin" Subject: Re: [Ltru] Re: Registry in record-jar format In-Reply-To: <004301c53262$5aa258e0$030aa8c0@DEWELL> References: <20050325125855.KUPU2135.mta2.adelphia.net@megatron.ietf.org> <000801c53227$3ba04640$030aa8c0@DEWELL> <6.1.2.0.2.20050326210800.043a07e0@mail.jefsey.com> <004301c53262$5aa258e0$030aa8c0@DEWELL> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; format=flowed X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - montage.altserver.com X-AntiAbuse: Original Domain - ietf.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - jefsey.com X-Scan-Signature: 8fbbaa16f9fd29df280814cb95ae2290 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by ietf.org id VAA26226 Cc: X-BeenThere: ltru@lists.ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Language Tag Registry Update working group discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ltru-bounces@lists.ietf.org Errors-To: ltru-bounces@lists.ietf.org X-Virus-Scanned: amavisd-new at alvestrand.no At 03:17 27/03/2005, Doug Ewell wrote: >The registry will need to be updated whenever a new code element is >added to ISO 639, 3166, or 15924, or whenever the Language Subtag >Reviewer approves the registration of a variant subtag. By my estimate, >this is anywhere from 3 to 12 times a year. I don't think an RFC is the >right vehicle for something as dynamic as that. What I mean is that: - if you put the data in the BCP 047, it would become obsolete when there= =20 would be an ISO update. - if you put them in a separate RFC, the IANA will load them. At the firs= t=20 change, you update the IANA according to the BCP047 procedure (new=20 document) and obsolete the data RFC. >However, we are talking about ISO 639-3 code elements used as >extended-language subtags, and it does need to be noted that there are >more than 7,000 of these and there is NO plain-text format that can >express them all without chewing up quite a few bytes. I created a text >file from the tables in a draft version of 639-3, in about as compact a >format as you can imagine: > ># Reference name|ID|Scope|Part 2|Status|Macro| >!O!ung|oun|I|||| >!X=C3=B3=C3=B5|nmn|I|||| >//Ani|hnh|I|||| >//Gana|gnk|I|||| >//Xegwi|xeg|I|||| >/Gwi|gwj|I|||| >/Xam|xam|I|||| >etc. > >and even this file, with 7,617 records (479 of which are already in >639-1 or 639-2 and thus won't need to be added to the registry), is more >than 161,000 bytes. There's no getting around it; no matter how you >slice it up, no matter what format is chosen, a registry with seven >thousand languages is a large registry. We fully agree. I rose the question of the load when we started discussin= g=20 XML. IMHO, but this may be subjective, this is why I asked, the least=20 directly usable is the format, the least there will be chances that someo= ne=20 builds an application calling the IANA. I will be frank IMHO the IANA registryis has only a political and and a=20 central exposure interes: it would be better just to document where to fi= nd=20 the data supported by their referent. We have the experience of the DNS=20 database: the IANA delays/moods in getting the DNS root updated are the=20 main problem of TLD Managers. Also, be sure that if you start building a=20 formal system, imposing constraints, usage will just partly move away fro= m=20 it. Like for the DNS, with alt-roots and PADs. NB. I keep refering to the DNS because (I quoted the STD 013) it could be= =20 the proper vehicle to support the data and permit immediate access by=20 applications. I do not know if I will be able to present/discuss our draft as we planne= d=20 due to the Easter delay, so let me introduce the idea and it can be=20 discussed before the draft (at this stage it is just a concept, but I hav= e=20 the tools and the possibly useful plug-in). 1. there are four descriptive subtags (language, script, culture, style) 2. there is an authoritative subtag (referent). Let assume that the referent is designated by his domain=20 name. "style.culture.language.script.referent" can be a domain name wher= e=20 an information is stored (exists or not, to start with) in an RR. Exemple= :=20 "casual.fra.fr.latn.jefsey.com" will be where to find my description of t= he=20 way I understand casual French of France. I can certainly have a CNAME wi= th=20 "casual.fr.fr.latn.jefsey.com". The reason why I put the script first is that "latn.jefsey.com" can be=20 transformed by a plug-in (currently used by scores of millions of users)=20 into ".langtag" in Latin character. The call is then casual.fr.fr.langtag= ,=20 with a RX being responded to the application documenting the registration= =20 (it can simply be the Handle of the documentation - permanent document=20 addressing system used by the LOC and many others). It just calls for a db.file description to immediately support 7500=20 languages and 20000 dialects and all what you want. 15.000.000 tags are n= o=20 big deal vs. hundreds of million of domain names supported by nearly one=20 million of nameservers. And it costs nothing. > > I want also to underline that the figure I quote are not for "every > > time", but very conservative (1 a year) as you call for. > >You are correct, and I did miss that. > >I suppose a valid question might be, how often does an application need >to update its copy of the registry if it attempts to be "conformant" >with RFC 3066bis (however that is defined)? Currently we have >applications that claim to support RFC 3066, but don't necessarily >support all registered tags, such as "bs-Latn" which was registered just >last month. (For that matter, even the IANA pages don't list any of the >tags they have registered since mid-2004.) > >As you mentioned, anti-virus software requires virus definition files to >be updated frequently for maximum effectiveness. This is not the same >as requiring frequent updates in order for the software to work at all. >I wonder what the situation is with other applications that rely on data >from IANA registries. Are they considered non-conformant if they don't >have the latest version, or just behind the times? > >Maybe there should be a way to provide incremental updates to the >registry, so that the addition of one or two new subtags does not >require the whole registry to be replaced. This is probably >application-specific, and out of our scope. > > > What you quote about Microsoft is interesting. But it looks like a wa= y > > to tune an obligation imposed elsewhere? (if you speak French and are > > in Canada you are supposed to speak fr-CA?). > >No, that's the exact opposite of my point. We are in agreement. My Franglish probably the problem. I meant you are=20 supposed to use a Canadian locale, but Microft allows you to evade from = it. How do you support Tamils? > > But this is a good point. I recall you that my lang5tag proposition i= s > > not only in line with every dictionary, but also with Word and with > > the "preferences" quoted in the Charter. > >We'll see what your draft says. Being "in line with" Microsoft Word is >not a goal of the current draft, though. You are right. I only assume that Word is probably in line with the users= =20 otherwise they would use something else. So being level with Words and=20 Dictionaries (which usually document a word in giving the style, the=20 reference and possibly the date) gives chances to be level wit the real = needs. jfc _______________________________________________ Ltru mailing list Ltru@lists.ietf.org https://www1.ietf.org/mailman/listinfo/ltru