Return-Path: Received: from murder ([unix socket]) by eikenes.alvestrand.no (Cyrus v2.2.8-Mandrake-RPM-2.2.8-4.2.101mdk) with LMTPA; Sun, 08 May 2005 04:23:18 +0200 X-Sieve: CMU Sieve 2.2 Received: from localhost (localhost.localdomain [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id C19C761B5C for ; Sun, 8 May 2005 04:23:14 +0200 (CEST) Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 32437-03 for ; Sun, 8 May 2005 04:22:54 +0200 (CEST) X-Greylist: domain auto-whitelisted by SQLgrey-1.4.8 Received: from megatron.ietf.org (megatron.ietf.org [132.151.6.71]) by eikenes.alvestrand.no (Postfix) with ESMTP id EAAF361B43 for ; Sun, 8 May 2005 04:22:49 +0200 (CEST) Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DUbPg-0001qp-Cw; Sat, 07 May 2005 22:21:00 -0400 Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DUbPf-0001qk-EF for ltru@megatron.ietf.org; Sat, 07 May 2005 22:20:59 -0400 Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id WAA08826 for ; Sat, 7 May 2005 22:20:57 -0400 (EDT) Received: from montage.altserver.com ([63.247.74.122]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DUbeU-0006Mv-E6 for ltru@ietf.org; Sat, 07 May 2005 22:36:18 -0400 Received: from lns-p19-2-idf-82-251-149-149.adsl.proxad.net ([82.251.149.149] helo=jfc.afrac.org) by montage.altserver.com with esmtpa (Exim 4.44) id 1DUbPd-0000Ha-Pu for ltru@ietf.org; Sat, 07 May 2005 19:20:58 -0700 Message-Id: <6.2.1.2.2.20050508032918.039af710@mail.jefsey.com> X-Mailer: QUALCOMM Windows Eudora Version 6.2.1.2 Date: Sun, 08 May 2005 03:58:53 +0200 To: LTRU Working Group From: "JFC (Jefsey) Morfin" Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - montage.altserver.com X-AntiAbuse: Original Domain - ietf.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - jefsey.com X-Scan-Signature: 538aad3a3c4f01d8b6a6477ca4248793 Cc: Subject: [Ltru] RFC 2277 - considerations X-BeenThere: ltru@lists.ietf.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Language Tag Registry Update working group discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ltru-bounces@lists.ietf.org Errors-To: ltru-bounces@lists.ietf.org X-Virus-Scanned: amavisd-new at alvestrand.no This is not a position. Just a thinking and a call for comments. ---- RFC 2277 says: "This document uses the term "charset" to mean a set of rules for mapping from a sequence of octets to a sequence of characters, such as the combination of a coded character set and a character encoding scheme; this is also what is used as an identifier in MIME "charset=" parameters, and registered in the IANA charset registry [REG]. (Note that this is NOT a term used by other standards bodies, such as ISO).". The point is that if the term is not used elsewhere, it may mean that the concept is adapted to the network dynamic environment and others are less or not. It charset=encording_scheme+character_set. I first considered that W3C was right in having identified its need of "scripts" support, however the idea of making them dependent from languages seemed to be strange. But the more I think of them, the more I have difficulty understanding what the "script" notion, introduced in the Draft, brings in addition to the charsets: it belongs to it. The more I see sources of conflicts if this is not respected. The more I see that the script is one of the rules which shares in the definition of the charcter set, and the more I fail to see where the W3C has a problem (except may be in confusing charset with encording scheme only, however http://www.w3.org/TR/REC-html40/charset.html starts with a clear "character set" part where it specifically quote Latin and Cyrilic)). I come back to a normal process of access to a page/document. 1. to be able to read it I need to know the charset. This is the first information. It tells me the rules for mapping from a sequence of bytes to a sequence of characters (character encoding scheme: ex. UTF-8 and combination of coded characters, ex: ISO 15924). Ex: UTF-8-Latin 2. then when I read I need to understand. I have the language. And possible region. As per RFC 3066 existing scheme and not calling for a modification of the existing libraries. 3. the interest is that this is compatible with IDN tables (and permits to address the high level IDN homograph problem, since charsets are documented everywhere). I also note that RFC 2277 and 3066 seem to address the locales need (however CLDR may have some proprietary special needs, authors have not documented?) I therefore tend to think the "script" information is to be located in the charset tag. I suppose they are able to understand UTF-8.latin as UTF-8 and that legacy is transparent? Comments? jfc _______________________________________________ Ltru mailing list Ltru@lists.ietf.org https://www1.ietf.org/mailman/listinfo/ltru