Return-Path: Received: from murder ([unix socket]) by eikenes.alvestrand.no (Cyrus v2.2.8-Mandrake-RPM-2.2.8-4.2.101mdk) with LMTPA; Tue, 05 Apr 2005 03:34:18 +0200 X-Sieve: CMU Sieve 2.2 Received: from localhost (localhost.localdomain [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id B09B161B4C for ; Tue, 5 Apr 2005 03:34:18 +0200 (CEST) Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 30623-04 for ; Tue, 5 Apr 2005 03:34:13 +0200 (CEST) Received: from megatron.ietf.org (megatron.ietf.org [132.151.6.71]) by eikenes.alvestrand.no (Postfix) with ESMTP id 961C261AF1 for ; Tue, 5 Apr 2005 03:34:12 +0200 (CEST) Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DIctm-0005Oz-FO; Mon, 04 Apr 2005 21:30:34 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DIbAc-0000Ez-82 for ltru@megatron.ietf.org; Mon, 04 Apr 2005 19:39:50 -0400 Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id TAA26043 for ; Mon, 4 Apr 2005 19:39:46 -0400 (EDT) Received: from [63.247.76.195] (helo=montage.altserver.com) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DIbId-0006Vj-CH for ltru@ietf.org; Mon, 04 Apr 2005 19:48:08 -0400 Received: from if12m4-235.d2.club-internet.fr ([212.195.66.235] helo=jfc.afrac.org) by montage.altserver.com with esmtpa (Exim 4.44) id 1DIbAR-0007mw-CL for ltru@ietf.org; Mon, 04 Apr 2005 16:39:41 -0700 Message-Id: <6.1.2.0.2.20050405000927.03c88e10@pop.online.fr> X-Sender: jefsey@pop.online.fr X-Mailer: QUALCOMM Windows Eudora Version 6.1.2.0 Date: Tue, 05 Apr 2005 00:10:23 +0200 To: ltru@ietf.org From: Jefsey Morfin Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; format=flowed X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - montage.altserver.com X-AntiAbuse: Original Domain - ietf.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - online.fr X-Scan-Signature: b058151374d77ee76edaac850f7449fb Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by ietf.org id TAA26043 X-Mailman-Approved-At: Mon, 04 Apr 2005 21:30:33 -0400 Cc: Subject: [Ltru] draft review X-BeenThere: ltru@lists.ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Language Tag Registry Update working group discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ltru-bounces@lists.ietf.org Errors-To: ltru-bounces@lists.ietf.org X-Virus-Scanned: amavisd-new at alvestrand.no Gentlemen, This is the review I made of the current draft : the text and the way it=20 addresses the Charter. I accept there is a challenge in the charter, whic= h=20 is to reconcile two of its assigned challenges. Its authors chose to=20 address one of them (to identify the role of the subtags in the langtag),= =20 while my approach is to start from the other (stability) to address both,= =20 in a more general way. The result is that my review has only a very limited number of positive=20 remarks. But that does not mean that I full oppose. It confirms the only=20 two possibilities already offered by the ietf@ietf.org mailing list: - either to accept the draft in the strict area quoted by the Charter (XM= L,=20 HTML and CLDR) to describe the language human readers should know to read= a=20 tagged text. - or to write another one, in a totally different perspective, to support= =20 the multilingual internet architectural consistency we need. In any case the first solution would only be a temporary patch until the=20 second one can support the world priority for a multilingual internet. I only hope this will permit Addison Philips and Mark Davis to improve=20 their text so it might make an RFC. DRATF RELATED COMMENTS AND QUESTIONS This review only concerns the points discussed in the proposed draft, not= =20 the points which are missing. 1. Abstract: "indicate the language" does that mean qualify (tell about) = or=20 define (show the way) or both? 2. Abstract: "information object": does that include programs, services,=20 users communities 3. Abstract: "interchange". Is there a special reason to use the word=20 "interchange" (quoted 13 times in the IETF RFC database) and not=20 "exchanges" (38 times). What are the differences intended? 4. Introduction: the word identify/identifier is used 6 times, while=20 indicate/indicating only 2 to avoid repetition of identify. Why not to us= e=20 the word "identify" in Abstract (cf. question 1)? 5. Introduction: the introduction describes the use of the tag to name a=20 menu of references (dictionaries) and documents that it is often necessar= y=20 to document style related elements (dialect, orthography) and writing=20 system. These are the 5 elements I want to see documented. 6. Introduction: documents that knowing the language is useful (qualifies= )=20 or required (defines) for some processes. What is the intended meaning? 7. Introduction: indicates that labels are one of the means of indicating= =20 (meaning?) languages used. No other means is alluded to, and no comments=20 about their common consistency is provided? 8. Introduction: identifies only two functions for the document: an=20 "identifier mechanism" and a "registration function". It does not talk of= =20 the dissemination of that information, nor of the way applications should= =20 use it. 9. Introduction: the sentence should say "This document intended to=20 replace RTC 3066 and as such to become the new RFC 047" if this is the in= tent. 10. The language Tag: introduction talks about labels. What is the=20 difference between the labels which have been documented as necessary and= =20 the languages tags now documented? 11. Syntax: the sentence "this makes it possible to construct a parser=20 =85even if specific subtag values are not recognised" is quite obscure. W= hat=20 is the exact meaning of "recognised": understood, known, accepted,=20 authoritative, canonical, identified? 12. Syntax: "a parser need not have an up-to-date copy .. to perform ..=20 most .. searching and matching", what tells the parser the values it uses= =20 are up-to-date? How can we quantify the "most" and the number of=20 occurrences of the remaining cases, for one billion users and more? 13. 2.1.1: how can the complex subtag sequence adding more precision can=20 "seldom add useful distinguishing information" they are obviously intende= d=20 to? Where is documented the authority of the following "because" saying=20 that more granular tags interfere with the meaning, etc.. intended by the= =20 user. I do not oppose that users can be clumsy, but I think that the idea= =20 they are seldom smart should be explained. I feel the problem is more wit= h=20 filtering/analysis limitations. 14. 2.1.1: that subtags SHOULD be limited to four subtags is not=20 documented. The allusion to the 2.3. for more information (which provides= =20 guidance about best choice of subtag content) does not seems to document=20 this at all. 15. 2.1.1: is accepted as a "conformant implementation" an application no= t=20 supporting a non specified length (1, 10, 100 chars?). The consequences o= n=20 usage are not documented. 16. 2.2: it is noted that the language used for the "language tags=20 namespace" and its registry is quite similar to the domain name system.=20 However the proposed semantic is not consistent with other Internet space= s=20 like DNS, IPv4, OID, etc. where the dot-separation is used, something use= rs=20 and parsers are accustomed to and existing processes have identified in=20 different scripts. It relies to the contrary on the "-" as a separator=20 which is more confusing and may have less identified homographs. 17. 2.2: the proposed design of language tags mixes identification of=20 subtags by their position unless it is by their length. It is probable th= at=20 in the particular case of the legacy and initial situation this can work.= =20 Such a two, possibly contradictory, systems format will never scale, is f= ar=20 too dependent from external changes and unable to support innovation. Thi= s=20 cannot be made a world-wide standard through an IETF BCP except in the=20 cases defined by the charter if the IESG wants to run into the risk of=20 endless conflicts and of a quick obsolescence. It is to be noted that the= =20 referred ISO standard to be used are less than 30 years old and yet ISO=20 3166-2 cannot be supported. 18. 2.2.1: primary language: fixed length identification starts with 2 or= 3=20 and possibly 8 but discouraged language ID coming this way only from ISO=20 639. This removes the possibility to consider computer related (non only=20 programming) languages, dialects, etc. nor to adapt to evolutions,=20 adjustments, and passed languages. For example, ISO 3166 could back to 18= 00=20 and possibly to 1000 and even to much before, either as a an ISO document= =20 or as a consistent table. Support of historic languages will be required.= =20 Blocking document historical consistency is unthinkable let just conside= r=20 the current effort by Google and various libraries. 19. 2.2.1.: the 2 letters code for language is an oddity inherited from=20 earlier times of RFC 1776 and ISO 639. Nothing against this being the=20 default in some legacy or private application. It is likely that at some=20 time it will be timed out by ISO or/and by usage may be even by=20 anti-racist laws Time is now to update existing applications rather than = to=20 increase complexity of the years/century to come. This makes me think to=20 "UK" instead of "GB". I know why Mr. Peter Jones made the world to use=20 ".uk": I will be also able to tell my grand-grand-son why they are to rio= t=20 against the 2/3 letters cultures discrimination. This seems also in=20 contradiction with the spirit of the quoted ISO 639/RA-JAC statement whic= h=20 says "users are directed in Internet applications to employ the alpha-3=20 code" which sounds as the part of the statement which will stay as=20 universal .. for a short while (?). 20. 2.2.2: language subtags are permitted only if they are 3 characters (= a=20 permanent rigid position) based the anticipation of a non documented ISO=20 639 works. This is also a violation of the Internet standard process: the= =20 document in reference should be quoted and cannot be a draft. Language=20 extended subtags are the most active part of languages, yet the rigidity=20 imposed by the chosen format obliges to prevent their registration by IAN= A=20 (what is the very purpose of the document: to permit flexibility to suppo= rt=20 real network life, where ISO would be too slow). 21. 2.2.3: Script subtags follow the same rigid logic and constraints fro= m=20 the format. What happens if the memory waste of ISO 15924 (3 bytes lost)= =20 is corrected, or if another code element has a fixed 4 characters length= =20 in the future? 22. 2.2.4: I understand that all the regional language differences of the= =20 world are to be supported by the ISO 3166 alpha-3/digit-3 list. This mean= s=20 that regions like NY, TX or California are not entitled a code but the 56= =20 persons of Pitcairn Island yes? I doubt that disparity can hold very long= ,=20 all the more than ISO 3166-2 provides all the possibilities for a far mor= e=20 adequate granularity. 23. 2.2.9: There is a MUST in "there MUST be an attempt to register" whic= h=20 cannot be enforced if there is not a non-delaying procedure to verify tha= t=20 a language was attempted to be registered with ISO 639. Otherwise this pa= rt=20 is to be understood as a disguised way, concerted with ISO, to block name= s.=20 The concern on this point is high enough to see the Draft blocked. It see= ms=20 that the second paragraph is a smoky verbose replay of the same idea,=20 without any procedural description nor request/provision of formal proof.= =20 The general idea is precisely in opposition with the purpose of the=20 proposed RFC: to be able to register names not registered by ISO. This=20 amount to a legitimisation of censoring, and censoring against the very=20 intent of this document. 24. 2.2.9: registrations are left to a decision of appropriateness by=20 someone debating with undefined others for a matter without any importan= ce=20 on the network stability and security (documented in in part 4) non on th= e=20 end to end interoperability. This seems to amount to pure intellectual=20 censoring. 25. 2.3: recommendation 3 seems inappropriate. Aliases are aliases. All t= he=20 aliases must be equally supported because (a) they are aliases (b) to mak= e=20 sure developers develops correct code. 26. 2.4. "language tags always define a language as spoken by human being= =20 for communications of information to other human beings. Computer languag= es=20 =85 are explicitly excluded" has no ground in the Charter and in reality.= Web=20 Services relations are excluded which may speak limited languages. Coded=20 human languages should be supported: they fit the definition. 27. 2.4.1: in the canonicalization part "" is reminded as a deprecation=20 indicator, yet this is not documented earlier. It seems this is an extern= al=20 ISO practice. This should be documented in the format description part. A= ll=20 the more than this practice is counter intuitive "" being understood=20 intuitively as "-(nul)-". And the "" being used in IDN there could be som= e=20 homograph confusion to investigate. 28. 3: the reference to RFC 2434 is correct but the rest of the part 3=20 seems inappropriate. RFC 2434 says "If the IANA is expected to play a ro= le=20 in the management of a name-space the IANA must be given clear and concis= e=20 instructions describing that role". The part 3 is neither clear and conci= se=20 and is contradictory with the document which describes a IANA file to be=20 maintained by an IESG reviewer. The IESG having authority on the IANA, th= e=20 role of the IANA is to store and disseminate the current file version as=20 maintained by the reviewer. 29. 3.1: Description of "description" is clueless. It is a description bu= t=20 does not intend to be an English description but it is one. The addition=20 made in the IANA file are intended to be additions to corresponding=20 documented ISO tables. They MUST comply with the format of these tables=20 otherwise they add a disparity between the table and their IANA "appendix= ". 30. 3.1. includes a registry format description (OK) but also=20 considerations on the way the tags should be formed which have nothing to= =20 do in a file description. They should be moved into 2.4.1 31. 3.1. also includes direction to the Reviewer which should be presente= d=20 in a separate part from the format description. 32. 3.2. this part is not a IANA procedure but a long guidance for the=20 Reviewer and the Reviewing process participants, limited to current=20 possible cases. 33. 3.3 : Understanding the meaning of "Subtags required for stability an= d=20 to keep the registry synchronised" will probably be a source of long=20 debates. It should be documented. 34. 3.3. why a "MAY" concerning the "description, note and prefix fields= "=20 is not documented by conditions? Is that not a "CAN". 35. 3.3. the registration procedure is of extreme confusion and mixes the= =20 form to use, the lack of definition of the requester, the iana.org list=20 which is not introduced, the registration request which must be guessed, = a=20 non commented MAY, registration tricks, comments on probable behaviour of= =20 the reviewing list, digression on Slovenian, designation of the reviewer = by=20 the IESG, what should happen when the review period has elapsed without a= ny=20 guidance to the reviewer, that a IANA list Members and an IESG designated= =20 reviewer make an IETF decision, that the initial registrant has some mora= l=20 pre-eminence (under the form of a comment) and that languages are not=20 considered for registration on the fact they actually exist, but on their= =20 own (non documented) merits. 36. 3.4. Difficult to understand. The first sentence is probably inherite= d=20 from the former versions of the draft. "compatible with applications that= =20 process language tags according to this specification" seems to refer to=20 filtering which should be part of the of the second document produced by=20 the WG-ltru. 37. 3.4. The description of information to be maintained is clear, but th= e=20 format is not described. This permits IANA to freely change it or to=20 present it in HTML form. This does not help its automated reading. 38. 4. security considerations should not deal with users political=20 security outside of their network usage. Otherwise tons of such=20 considerations should be presented. 39. 4. An important security consideration is homographs. It is certainly= =20 possible to include part of text in a foreign language which look printed= =20 as in another language or having a different meaning or printing=20 (phishing). Concerns are also the double "-" which is specifically used b= y=20 the IANA code "xn". 40. 4. Fourth paragraph tend to say that specification of valid sub-tags= =20 MUST be available over the internet but that applications should take=20 possible DoS into consideration. This is an important indication on the w= ay=20 the Draft proposes the registry file to be used and accessed. It can be=20 read that applications can freely access it and proposed mirrors: this ma= y=20 impose on the IANA a load which will result in its permanent inability of= =20 service. 41. 5. character set consideration are contradictory: they say that=20 character a-z exist in most character sets (good news) [what means that=20 there are some where they do not exist] so there should not be character=20 set presentation issue [in the character set where they do not exist?].=20 Also the consideration only concerns the "display" what has a limited=20 interest if the a-z characters do not exist on the keyboard. But may be=20 this supposes that "intelligent people" use ascii compatible keyboards (s= ee=20 below). 42. 6. compatibility is preserved with RFC 3066 but not with evolution of= =20 ISO code elements. The XML Schema version 1.0 requirements are quoted but= =20 not documented. 43. 6. Stability. Confusion between document. This document does not=20 provide a mechanism but a format that can be used by the mechanism=20 described in the next document. This text has not been adapted after the = split. 44. 6. Validity. This document should define the IQ of the "intelligent=20 people" being considered or the collective IQ augmentation necessary to=20 understand the system ?? Please see the ideas of the one who created the= =20 NIC and grand fathered the RFC system (http://bootstrap.org). 45. 6. Extensibility such as presented actually results (in a very limite= d=20 way) from the underlying ISO codes. This is not the target of Charter whi= ch=20 is to permit scalability even when a code element it is not supported by = ISO. 46. 6. the document uses several times the term "extlang" but does not=20 defines it. 47. 6. last: added text for "" is not sufficient enough, or is missing in= =20 my version. CHARTER VS DRATF RELATED COMMENTS AND QUESTIONS 48. language preferences are uniquely understood in HTML, XML only. CLDR=20 are quoted in the charter and not quoted in the Draft. The Charter does n= ot=20 prevent other applications, systems to be supported. The Draft does not=20 allude to them. 49. The charter lists RFC 3066 problems. These problems are: (a)=20 stability there is a paragraph on the matter; (b) accessibility to the=20 underlying ISO standard this is definitely impeached by the format (no I= SO=20 3166-2, no other ISO 639 format than 2 or 3 characters no other script=20 description format than 4 characters, etc. as if the current ISO=20 presentation will never improve); (c) difficulty with registration and=20 acceptance: this could be improved by the subtag registration system but = it=20 seems to be made worse, due to the censoring rules introduced to prevent=20 non-ISO entries to be entered in the IANA non-ISO table; (e) lack of clea= r=20 guidance to identify script and region: scripts are Unicode only, region=20 are 2 letter Telex codes; (f) lack of parseability and well-formedness :=20 this has certainly been addressed [it seems to be both the major=20 improvement of the Draft =85 and the source of most of its problems due t= o=20 the rigidity it introduces]. 50. The main purpose of this Draft from the charter is to describe the IA= NA=20 registry to support the resolution of the above problems, and how=20 transition from RFC 3066. This is to be in a clear and concise way. RFC=20 3066 represents roughly 17.000 characters and the draft 70.000 (out of th= e=20 IETF format and verbose). This makes it confuse. From what I understand i= t=20 includes 3 parts: (a) the subtags file with a clear format (b) the=20 accompanying registration/update forms (c) the variant tables with a clea= r,=20 yet less precise format. From what I understand (a)(b) are the real=20 responsibility of the old aliased distribution list and of a Reviewer=20 designated by IESG with unlimited veto powers; (c) of the IESG when=20 reviewing RFCs requesting entries, and of the updating mechanism defined = by=20 these RFC. 51. it lists challenges to be addressed. Stability: "how the language tag= s=20 remains stable even if the underlying references should change". This mea= ns=20 a process where the tag name is unrelated to its underlying components,=20 like a domain name is stable even if the underlying IP address changes.=20 This is not provided. 52. it lists challenges to be addressed. Accessibility: "a simple way to=20 determine if a subtag is valid as of a given date. Like receiving a 404=20 when calling an expired domain name". Such a mechanism is not provided. 53. it lists challenges to be addressed. extensibility: this meant not=20 having to record millions of combinations. This is provided. To the price= =20 of format rigidity, impossible use of foreseen or existing ISO code=20 elements, and a censoring of the non-ISO extensions which may lead to mor= e=20 harassment. It also meant addition of the script in language tags. This i= s=20 permitted by the proposed format but to the detriment of other 4 letters=20 entries. Registration of non ISO scripts is not permitted. 54. it lists challenges to be addressed. "provide mechanism to support th= e=20 evolution of the underlying standards, in particular ISO 693-3, mechanism= s=20 to support variant registration and format extensions, as well as allowin= g=20 generative private use when necessary": I am not sure what "generative" m= ay=20 mean in here but I feel it is not supported, the rest is certainly oppose= d=20 by the chosen format; 55. it lists challenges to be addressed: "to specify a mechanism for easi= ly=20 identifying the role of each subtag in the language tag". This is=20 addressed by the Draft. But this challenge is contradictory with stabilit= y=20 challenge above. If a language tag displays an identifiable subtag, it=20 becomes by nature dependent from the underlying value of the subtag. I will study carefully the responses to this review before introducing my= =20 own Draft, to try to build if possible on the largest possible number o= f=20 consensual elements. My current thinking is totally different. It is an open framework which=20 respects the XML, HTML, CLDR requirements in welcoming your own (adapted)= =20 Draft, the ISO evolution, the requirements of an Internet for the people = of=20 the world by the people of the world, at an affordable cost, with an high= ly=20 innovative technical approach, a great care for operation security and=20 stability and in total continuity with the funding concept which gave us=20 thirty years of international public network stability. But I think the issues it rises are important enough to call on an=20 understanding, comments, and a support by all those concerned by a=20 "multilingual cyberspace", an equal cultural dignity empowerment on the=20 digital ecosystem and an open e-commerce. This is because language tags a= re=20 by nature the basic building blocks of the multilingual internet which is= =20 also to be user centric, multitechnology (convergence), multicontent=20 (information society), multilateral, as the WSIS shows it. jfcm=20 _______________________________________________ Ltru mailing list Ltru@lists.ietf.org https://www1.ietf.org/mailman/listinfo/ltru