Return-Path: Received: from murder ([unix socket]) by eikenes.alvestrand.no (Cyrus v2.2.8-Mandrake-RPM-2.2.8-4.2.101mdk) with LMTPA; Wed, 15 Jun 2005 16:34:09 +0200 X-Sieve: CMU Sieve 2.2 Received: from localhost (localhost.localdomain [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id B075F61B01; Wed, 15 Jun 2005 16:34:08 +0200 (CEST) Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 25151-03; Wed, 15 Jun 2005 16:34:08 +0200 (CEST) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id 7DF9E61B58; Wed, 15 Jun 2005 16:34:00 +0200 (CEST) X-Original-To: ietf-languages@alvestrand.no Delivered-To: ietf-languages@alvestrand.no Received: from localhost (localhost.localdomain [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id 0F47961B43 for ; Wed, 15 Jun 2005 16:33:59 +0200 (CEST) Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 25080-04 for ; Wed, 15 Jun 2005 16:33:56 +0200 (CEST) X-Greylist: whitelisted by SQLgrey-1.4.8 Received: from pechora.icann.org (pechora.icann.org [192.0.34.35]) by eikenes.alvestrand.no (Postfix) with ESMTP id 6248061B01 for ; Wed, 15 Jun 2005 16:33:55 +0200 (CEST) Received: from montage.altserver.com (montage.altserver.com [63.247.74.122]) by pechora.icann.org (8.13.1/8.13.1) with ESMTP id j5FESC00003655 for ; Wed, 15 Jun 2005 07:28:13 -0700 Received: from ver78-2-82-241-91-24.fbx.proxad.net ([82.241.91.24] helo=jfc.afrac.org) by montage.altserver.com with esmtpa (Exim 4.44) id 1DiYxB-0004yU-Di; Wed, 15 Jun 2005 07:33:33 -0700 Message-Id: <6.2.1.2.2.20050615135659.04c1be30@mail.jefsey.com> X-Mailer: QUALCOMM Windows Eudora Version 6.2.1.2 Date: Wed, 15 Jun 2005 15:37:09 +0200 To: "Peter Constable" , "IETF Languages Discussion" From: "JFC (Jefsey) Morfin" In-Reply-To: References: Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; format=flowed Content-Transfer-Encoding: 8bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - montage.altserver.com X-AntiAbuse: Original Domain - iana.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - jefsey.com X-Source: X-Source-Args: X-Source-Dir: X-Virus-Scanned: amavisd-new at alvestrand.no Cc: Subject: RE: Swiss german, spoken X-BeenThere: ietf-languages@alvestrand.no X-Mailman-Version: 2.1.5 Precedence: list List-Id: IETF Language tag discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ietf-languages-bounces@alvestrand.no Errors-To: ietf-languages-bounces@alvestrand.no X-Virus-Scanned: amavisd-new at alvestrand.no At 06:56 15/06/2005, Peter Constable wrote: > > From: JFC (Jefsey) Morfin [mailto:jefsey@jefsey.com] > > >Incorrect. Issues discussed on this list relate to registration of >tags > > >"for the identification of languages" -- that is, tags to be used as > > >metadata elements to declare the linguistic properties of content in > > >Internet and other protocols and applications. There is nothing >stated > > >anywhere that these tags necessarily apply only to text content. > > > > Except that to register a language you must provide printed references >.... > >The relevant field on the form is: >"Reference to published description of the language (book or article)" Agreed. This is why I say "gray area" and I say that my point of view seems to disagree with this list understanding. It is true that it is possible to do everything I need with RFC 3066. It is true that it would still be possible to do it with the Draft (x-tags). But the de facto accumulation of "possibilities" and attitudes makes it less and less _perceived_ as consensually correct. The increased support of written forms (particular interest in scripts) should be accompanied by the same concerns for the other languages vectors. This is one of the reasons of the experimentation we engage. This experimentation will be carried in due respect of the conditions expressed in ICANN ICP-3 document, in adding some constraints we discovered in experimenting it, and will be part of the Draft I will present in Luxembourg. The target is to document that/how tags may covers the full scope of the language issues as you express it further on. This could be addressed today, this will call for more effort, wasted time and possible split. But at the same time experimentation will demonstrate. >Note that the reference is to a *description* of the language, not >necessarily a work in the language itself. Thus, one could provide >references to descriptions of (say) American Sign Language, which is not >commonly written. OK. The problem is that many linguistic variations and even evolutions are nowadays supported by media and not documented as such. I will take an old example, out of the current debate so (I hope) neutral. Forty years ago "Le Cid" (one of the most famous play of Pierre Corneille, in alexandine lines) was rewritten in "Papaouette" a language of a _quarter_ of Algiers. Such an exercise is common, but this one was famous and quite funny (it turns out that plenty of words and expressions are self-understandable, so you easily understand 75% of the play). No printed version (or hand written and lost)). There is therefore no problem for someone to publish a multi version CD of the play by the Reference "Comédie Française" and add multiple spoken versions. In Papaouette and in other kind of similar "languages". . We all wrote our own "locale" version (I wrote thousands of funny lines for a College version. He will have to make a menu. It is likely that such a publication will rise an interest in the "Pied Noirs" (French people of North Africa) and in local communities. That several other publications, plays, etc , reviving languages which are privately still much in use with no more geographical roots. I wrote a few songs in Accademy language, I had to "translate" to obtain the agreement of the famous authors I used the music and of SACEM (I published a record of Academy songs). There is therefore a need for this true languages to be identified (Papaouette is a mix of French, Spanish, Arabic, Jewish, Italian, etc. words and grammatical constructs. Professional languages are sometimes very complex and rooting in several other languages). Documented sources on Papaouette - a part from press, etc. does not exist. No linguist published on it. If this happens in a lingually sophisticated country with a very structured support of the dominant language, I suppose it can happen in many other places and in many other Diasporas. We have a published dictionnary of the Academy Language but its main purpose is to document words, locutions and constructs entered in the French language and to document some history of words over 175 years. This is why I disagree with "documenting". The documentation is that one person claim for a language name and that others puts a meaning on it. No one ever registered until recently the names England, Italy or Germany. Yet there are millions of people who used it for centuries. And when they were "registered" they were actually just "recorded". >Several years ago on the IETF-languages list, there was some discussion >of what kinds of materials could be referred to. I had just the opposite >concern: someone might need a tag for a lesser-known language and not be >able to provide references to a description of the language. There was >consensus that references could be to a *description* of the language, >or to a work *in* the language. Yes, but today it should be a record, and that description should only be by the one registering. There should be no filtering by "experts". Experts should help every registering and then advise on their use. I should be able to register "jfc" to support my Franglish. > > I note your "eclare the linguistic properties of content" which is > > someting I could agree with. But which is not exactly the wording of > > the document you refer to. > >No, it's not the wording of the RFC, but I very much feel the >appropriate characterization of "language tags" is that they primarily >function to declare attributes. The problem - where I oppose - is that declaration is neutral (someone declares a tag and describes the tag). Then anyone can freely use the resulting list for whatever he wants. For example to classify his library. If I want to put the French books under Russian when the author is Russian, I can. If I explain my rules it may even be understood by _my_ users having read my explanation. The description of the page in using the tag is no more neutral: some criteria must be used to determine if this is really the language. This can be subjective (you say this is in English because I speak English and I wrote for English readers, but there are still many questions). This can be documented with rules establishing what an English text looks like and a filtering being made. The real problem comes when RFC 3066 (and further variations) start using "defining". Again I know that English does not intend to be very precise in logic and much more precise in conveying a feeling, an understanding. But we are in standard texts. The single usage of defining opens the way to the language norming. The filtering is no more to know but to decide. The second problem is that in our current word we need that layer for machine processing. If you do not document the process you refer to, you create a de facto "default" process of reference. This process in the mind of the public will be the market dominant one. And you enter in many other problems. >(This distinguishes them, in my mind, >from locale identifiers, which primarily function as API parameters used >to tailor culture-dependent processes.) IMHO gray area. You are fully right if you come from "inside" one of the technical processes, like UNICODE. This progressively build-up towards a more comprehensive support of several things like API. Now, consider you come from the users side (my whole reasoning is based on a user-centric network architecture, so please differentiate what you call an "end user" and what I call a "user". You consider "clients" of an applications, I consider a funding architectural concept with every right and every need). Users come by relational communities and the network is to serve these relations. Relational community network use protocols - person to person protocols are named "languages" in English (in French we will differentiate "langues" (languages) and "langages" (which include everything multimodal else). These protocols are dependent on community external factors (what I name "referents") and internal factor (what I name "contexts"). Depending on the kind of relation, historical situation, the matter, etc. referents and contexts may take the priority. This is why a name can span several languages: this is the ambition of the famous trade marks. So, API are also transient, so are "locales". I > I'd include linguistic >attributes, in the primary sense of that term, but also include >attributes related to the written form -- script, orthography, spelling, >transcription, transliteration -- in the case of textual content. But, >not all content need be textual, the system should facilitate tagging of >linguistic content regardless of the mode of expression. Yes. Again we are in full agreement. This is just a question of degree. Coming from a textual world, coming from SIL implementing that textual world in real world, using Unicode character set, etc. you are more text oriented. But today do you think SIL would proceed the same way? They would use recorders everywhere, speech analyser, synthetic voice may be, etc. Obviously text gives a solid framework. But just to start with. It was OK with ISO 639-1 and -2. Probably much more complex with ISO 639-3. Quite out of most of context with ISO 639-6. This is why it is urgent to have a strong stable framework to make sure that all the converging descriptions of attributes, which are themselves related to other converging attributes in many other areas, fall into a stable structured framework. The only one we have today is ISO 11179. I do not necessarily like all of it, and it is quite complex and unfinished. But I thing we are better off sharing the same problems as everyone else (including major Gov Administration all over the world, with budget to correct mistakes) than inventing our own and possibly leading to confusion or to major delays. When this list approves a tag (since it wants to discuss them, not to advise them) it has absolutely no idea of what the implications can be ten years from now and where. > > Due to the impact of ISO documents in the langtag registration process >and > > of their parallel evolution agreed by everyone (even if the nature of >the > > evolution may be different depending on the person) it is advisable to > > read > > ISO 639-1, -2, the drafts of -3, -4, -5, -6 you might find, ISO 15924 >and > > ISO 3166. For those wanting to understand the possible future >conflicts > > concerning the registrations discussed here they should consult ISO >11179 > > (scalability, updates, nature of the documented information, etc.). > >It certainly isn't a bad idea to be familiar with the 639, 15924 and >3166 standards. For 639, there's no particular point going looking for >parts 4, 5 or 6 at this time since there isn't a complete working draft >of any of them, and there is no immediate plan to have any of them >impinge on RFC 3066 or some successor thereof. I accept that (with the restriction above) in the narrow point of view of registering "xxx" or "i-xxx" tags for RFC 3066. I disagree about the Draft and successors. ISO 639-3 will obey to ISO 639-4 rules (or you will delay ISO 639-3). A retrofit which can be acceptable at ISO 639-3 concept layer, may not be acceptable at IANA layer. I also want to point out that our experimentation - which takes 4, 5, 6 into account - shows that their work address currently well identified needs. What Karen describes shows in addition that she needs for her own standardisation process to build on them (this work will obviously be contingent to the final documents as they are published. But this is true for every standard which may have a further revision). >ISO 11179 takes rather a deeper level of interest and commitment. It's a >six-part compendium on metadata elements and registries and metamodels >for metadata elements. The IANA registry for language tags which is the >focus of this list has never been considered an implementation of this >ISO standard, and knowledge of this ISO standard is not a prerequisite >to making useful contributions to the work of this list. Familiarity >with ISO 11179 certainly wouldn't get in the way of contributing to this >list -- unless one begins to behave as though others on this list are or >ought to be familiar with it as well. Correct. But responsible participation to the debate of this list calls for participants to understand that their propositions (again I regret that way of understanding the role of this list) may have an impact on a large number of registries made collateral to the ISO tables they consider, due to ISO 11179. I do not want to tell my story again, but should I have better thought about the implications of our 1984 consensus which permitted RFC 920 and the whole naming, and have added the numeric addressing in the agreement we would all be IPv6 for ten years. It is behaving in a responsible manner to try to have a reasonable understanding the possible implications of what one does. I submit that a reasonable understanding is to know that this exists and that this may have an impact, and banning these consideration a mistake. When you write a Draft, you are not necessarily an expert in Security. Yet the Security section helps you remembering that there are security aspects to what you specify. Your short description is a beginning, but people should also be aware that their own registry is qualified (even if it is not subject to) by ISO 11179 and tat the internationally understood terms relating to registries are defined there. That issues like versionning and update are documented there: disputing/mending ISO 3166 updates outside of ISO 111790 context is therefore a dead-end for the result to stay consistent with the tables, libraries, caches, locales as far as they are ISO consistent. jfc _______________________________________________ Ietf-languages mailing list Ietf-languages@alvestrand.no http://www.alvestrand.no/mailman/listinfo/ietf-languages