Return-Path: Received: from murder ([unix socket]) by eikenes.alvestrand.no (Cyrus v2.2.8-Mandrake-RPM-2.2.8-4.2.101mdk) with LMTPA; Tue, 12 Apr 2005 01:34:50 +0200 X-Sieve: CMU Sieve 2.2 Received: from localhost (localhost.localdomain [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id 1AF5461B49 for ; Tue, 12 Apr 2005 01:34:50 +0200 (CEST) Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 12371-09 for ; Tue, 12 Apr 2005 01:34:44 +0200 (CEST) Received: from megatron.ietf.org (megatron.ietf.org [132.151.6.71]) by eikenes.alvestrand.no (Postfix) with ESMTP id 8E5C361B54 for ; Tue, 12 Apr 2005 01:34:43 +0200 (CEST) Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DL8Q3-0006sn-UD; Mon, 11 Apr 2005 19:34:15 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DL8Q2-0006sI-EX for ltru@megatron.ietf.org; Mon, 11 Apr 2005 19:34:14 -0400 Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id TAA12025 for ; Mon, 11 Apr 2005 19:34:02 -0400 (EDT) Received: from [63.247.76.195] (helo=montage.altserver.com) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DL8ZN-00041l-1b for ltru@ietf.org; Mon, 11 Apr 2005 19:43:54 -0400 Received: from lns-p19-8-idf-82-249-21-108.adsl.proxad.net ([82.249.21.108] helo=jfc.afrac.org) by montage.altserver.com with esmtpa (Exim 4.44) id 1DL8PM-0006Pf-B3; Mon, 11 Apr 2005 16:33:33 -0700 Message-Id: <6.1.2.0.2.20050411225638.02f5e2d0@mail.jefsey.com> X-Sender: jefsey+jefsey.com@mail.jefsey.com X-Mailer: QUALCOMM Windows Eudora Version 6.1.2.0 Date: Tue, 12 Apr 2005 01:33:04 +0200 To: "Addison Phillips" , "LTRU Working Group" From: "JFC (Jefsey) Morfin" Subject: Re: [Ltru] Great Script Debate Part II: Formats... In-Reply-To: <634978A7DF025A40BFEF33EB191E13BC0AFA3E7A@irvmbxw01.quest.c om> References: <634978A7DF025A40BFEF33EB191E13BC0AFA3E7A@irvmbxw01.quest.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; format=flowed X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - montage.altserver.com X-AntiAbuse: Original Domain - ietf.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - jefsey.com X-Scan-Signature: bcd240e64c427d3d3617cfc704e7fd7f Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by ietf.org id TAA12025 Cc: X-BeenThere: ltru@lists.ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Language Tag Registry Update working group discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ltru-bounces@lists.ietf.org Errors-To: ltru-bounces@lists.ietf.org X-Virus-Scanned: amavisd-new at alvestrand.no At 20:46 11/04/2005, Addison Phillips wrote: >In this email, I'll use the subtags in the following 3066bis-style tag f= or=20 >demonstration purposes: >de-Latn-CH-1901-x-gleep Dear Addison, Thank you for this. I will comment from my point of view which is=20 orthogonality between format and content, calling for the format to be=20 adapted to the application rather than the content to be adapted to the=20 format. But I also understand your historical constraints (I did not thou= gh=20 they were so bad). So, do not take my comments as critics, but as pure=20 comment/explanation/question to be sure I follow your point. Then I consider if there is a way out of all this. >Let's consider the different ways that we can include script in a tag. F= or=20 >each option, I'll write out a tag with a script and a tag without a scri= pt. > >1. Script first. > >With: Latn-de-CH-1901-x-gleep >None: de-CH-1901-x-gleep > >This tag format has little to recommend it. Some folks may find that the= =20 >script-first position seems more suited to a hierarchy of tags (many=20 >languages are written in one script), but this position doesn't add=20 >appreciably to the information conveyed by a *language* tag. It has the=20 >same problems as 3066bis's design for "smart" 3066 processors. In fact, = it=20 >is worse, since it also pushes the language out of position. This might interest in several cases: 1. when you already know the script, this format permits to forget the=20 script - and stay compatible with former formats? (LDAP?) 2. when you want to access the information in using a system in its own=20 language, you must first know the script. But access (DNS, database,=20 matrix) is related to the query order, not to the sequence; so we agree. >2. Script combines with language. > >deLatn-CH-1901-x-gleep >de-CH-1901-x-gleep > >This tag format fixes the region code problem. It also installs a "uniqu= e=20 >new format" that some have asked for. The problem here is that the two=20 >tags above really should refer to the same language and 3066 processors = of=20 >all types (RFR, smart) will never get that. The unique format does not match the requirement of the Charter to be abl= e=20 to identify the subtags. The other problem is that this format follows th= e=20 ISO tables and is therefore not independent from the ISO evolution. It ad= ds=20 both problems. However, this negative aspects should not hide a totally different avenue= =20 which is aliasing (name or number or IPv6 address). The main problem in o= ur=20 discussion is still to define the framework - charter has not been=20 discussed. What do you think a language is, what is the purpose of the ta= g,=20 what is its application process, what is its use, what is the IANA=20 implication. We can discuss endlessly if the different associated decision parameters=20 are not defined. You decided that the solution was to be in a way you=20 analyzed and you know it is not perfect. And you want to be helped in=20 having it the less imperfect. I am not really able to help because I will= =20 have probably very different responses to the Charter "thorny" questions.= =20 But where I may be really able to help is that these responses - which wi= ll=20 necessarily be those of the market at some stage - could help you to=20 consider the problem differently and may be better solve it. >3. Script after language > >de-Latn-CH-1901-x-gleep >de-CH-1901-x-gleep > >This is RFC 3066bis, of course. The issue here is the script vs. region=20 >code that we're discussing. The advantage to this design is that script = is=20 >at the right level in the tag in most cases. RFR processors can remove t= he=20 >script (unlike in 2) during matching. RFR matching works as long as user= s=20 >are consistent about using or not using script for a particular prefix.=20 >"Smart" 3066 matching can't find the region code with this design. (Smar= t=20 >3066bis processing finds all subtags) I obviously have no objection to that. Except that format should not be=20 limited to 2 letters or 3 digits. I understand the sense it makes for XML= =20 processors. It makes no sense for processors of mine. The real point is:=20 does it make sense to people and web services? You responded - if I am=20 correct - that formats are orthogonal to web services and you did not=20 respond yet about word processors etc. So the only real issue is with use= rs. >4. Script after region and before variant > >de-CH-Latn-1901-x-gleep >de-CH-1901-x-gleep >de-Latn-1901-x-gleep // lookout... > >This doesn't harm 3066 processors in most cases, but harms users of thes= e=20 >implementations who need to find specific script versions of language.=20 >That is, RFR processors can't find matching scripts. "Smart" 3066=20 >processors (such as the J2EE demo), get language and region into the rig= ht=20 >slots. OK. This is again a problem internal to "your" W3C applications. Nothing=20 against this patching of the past, as long as we can proceed with some=20 innovative thinking for the future. Randy proposes a one tag per subtag=20 approach. I think it may not be the proper way to do that as tags, but th= at=20 this is definitely a serious step ahead. I am open to every flexibility i= n=20 calling on a stable external reference. >4a. Script as a variant. > >This is the same as (4), only the script is just another variant. Too=20 >boring to discuss. Accepted. >5. Script as an extension > >de-CH-1901-s-Latn-x-gleep >de-CH-1901-x-gleep >de-1901-s-Latn-x-gleep // my previous example of a tag that harms smart=20 >processors > >This has the advantage of allowing 3066bis "smart" processors to put the= =20 >script back between "de" and "CH" if it wants to internally, while not=20 >interfering with existing tags (note emphasis here: *tags*). RFR=20 >processors remove the script even before variant. Would Randy's proposition not be supported as a complete extension of tha= t?=20 I say that 5 descriptors are necessary. I have nothing against the idea=20 that all are supported this way. However ugly a format it would make. >6. Script as an attribute > >de-CH-1901-x-gleep; script=3DLatn; q=3D1.0 >de-CH-1901-x-gleep; q=3D1.0 > >This has the advantage of not mucking with the tags at all, but can't be= =20 >used in contexts such as XML :-(. Who knows how matching works. gee! it seems that you (W3C) created yourself quite a problem. I should=20 reread the architectural framework you published some times ago. But I fe= el=20 you violated there some of your extensibility, etc. most basic rules? >7. Script inferred from the tag > >de-CH-1901-x-gleep >This is what we have today. If you see "zh-TW", that must mean Hant. Goo= d=20 >luck if you live in HK. Frankly, is this a real problem? I mean status quo with the system you=20 specified, got developed and deployed. Why not to think about a new=20 generation? I know this means a big problem. But would that problem not b= e=20 smaller that the addition of all the problems you are going to create=20 yourself and ourselves? You are thinking about it for 2 years... makes a=20 lot of pages in the meanwhile. BCP 047 will not be updated before some=20 time. And then? All this will lead to will be ugly patches you do not wan= t=20 to match our own parameters when we go through. I do not know XML but I know the DNS. We were explained by many people al= l=20 the problems of the DNS if this, if that. Due to that they forced the IDN= A.=20 My relations here were "nice" when compared with the WH-IDN. Except that=20 they talked privately and every of them except one (not an author) accept= ed=20 the Draft was far from perfect (including conceptually). Anyway at the en= d=20 of the day we had IDNA to please ICANN (I would say as to please W3C) and= =20 to please China. There were after that a lot of fuss on the way to apply=20 IDNA and there is quite no more use except may be in the Latn countries. What I fear with your approach is that you take the same concept as IDNA:= =20 to try to compromise impossible needs, between an ASCII and a Multilingal= =20 Internets. And to be tied by your own format rigidity? Same problem with=20 the IP=A8v6 format rigidity ... >Any I missed? Yes, mine :-) Again, your proposition is OK for identification (with additional=20 referent/style element if more than a menu) because it is likely that the= =20 current ISO tables will not change much before a entire revision of the=20 matter has been carried. But, we must give the direction of the transitio= n=20 and of the future. The problem is not your proposition. But the way you want to present and=20 introduce it within the IETF framework .... as BCP 047. >Now, before you choose one of these, don=E2=80=99t' forget that you can = do=20 >s/script/extlang/ on the above text. We don't just have one interpolated= =20 >subtag, we have two. General comment. Let be candid. There is no real good solution to your need today. And you= r=20 need is part of the far more general answer to the need of a new generati= on=20 network IETF awaits for 13 years (since they found they missed IP=20 addresses) and I observed 21 years ago (when I interconnected the DoD, et= c.=20 to the public international networks and saw we missed some of the hooks=20 others networks had). The question is simple: is a patch worth the try no= w,=20 or would it not be better to reshape the whole network (I talk of its rea= l=20 architecture, not of NGN)? My feeling is that the reshape is necessary but I do not believe IETF is=20 ready for it. So the patch is advisable. But the patch must not try to ta= ke=20 the place of the reshape while the reshape is going to occur as a=20 grassroots process. Instead they should cooperate. This is why I advocate documents to be split a little bit differently tha= t=20 you did. 1. a standard virtual tags framework, describing the registry, the=20 procedures, the applications, the obligations of the formats to be BCP 04= 7=20 compliant. This will address ISO tables inclusion (and possibly other=20 tables) and IANA registry, as well as IANA data dissemination and=20 possible filtering guidelines as part of the registration process. Acces= s=20 to pertinence is not an easy issue to address. 2. the format/filtering system for XML, HTML, CLDR, LDAP, OPES, etc. May = be=20 one document by application. The interest is that the first document will be a Multilingual Internet=20 building block. With a universal scope. It should help your "local" W3C=20 needs to be put in a wider perspective and help their evolution. Its=20 interest is also that incidentally it respects the spirit of the Charter=20 better. Thank you. jfc _______________________________________________ Ltru mailing list Ltru@lists.ietf.org https://www1.ietf.org/mailman/listinfo/ltru