Return-Path: Received: from murder ([unix socket]) by eikenes.alvestrand.no (Cyrus v2.2.8-Mandrake-RPM-2.2.8-4.2.101mdk) with LMTPA; Wed, 11 May 2005 07:38:46 +0200 X-Sieve: CMU Sieve 2.2 Received: from localhost (localhost.localdomain [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id 2AC2C61B53 for ; Wed, 11 May 2005 07:38:46 +0200 (CEST) Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 18805-03 for ; Wed, 11 May 2005 07:38:41 +0200 (CEST) X-Greylist: domain auto-whitelisted by SQLgrey-1.4.8 Received: from psg.com (psg.com [147.28.0.62]) by eikenes.alvestrand.no (Postfix) with ESMTP id 3F44461AF1 for ; Wed, 11 May 2005 07:38:41 +0200 (CEST) Received: from majordom by psg.com with local (Exim 4.50 (FreeBSD)) id 1DVjrv-000KQO-7z for idn-data@psg.com; Wed, 11 May 2005 05:34:51 +0000 Received: from [63.247.74.122] (helo=montage.altserver.com) by psg.com with esmtps (TLSv1:DES-CBC3-SHA:168) (Exim 4.50 (FreeBSD)) id 1DVjrt-000KQ9-6W for idn@ops.ietf.org; Wed, 11 May 2005 05:34:49 +0000 Received: from lns-p19-4-idf-82-65-244-40.adsl.proxad.net ([82.65.244.40] helo=jfc.afrac.org) by montage.altserver.com with esmtpa (Exim 4.44) id 1DVjrr-0006dp-HB; Tue, 10 May 2005 22:34:47 -0700 Message-Id: <6.2.1.2.2.20050511050500.045cf140@mail.jefsey.com> X-Mailer: QUALCOMM Windows Eudora Version 6.2.1.2 Date: Wed, 11 May 2005 06:08:18 +0200 To: ietf@ietf.org From: "JFC (Jefsey) Morfin" Subject: [idn] a way toward homograph resolution ? (was "improving WG operation") Cc: idn@ops.ietf.org, "Hallam-Baker, Phillip" In-Reply-To: <001601c555d3$453fd9c0$7f1afea9@oemcomputer> References: <198A730C2044DE4A96749D13E167AD37250259@MOU1WNEXMB04.vcorp.ad.vrsn.com> <6.2.1.2.2.20050511021431.048f8060@mail.jefsey.com> <001601c555d3$453fd9c0$7f1afea9@oemcomputer> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - montage.altserver.com X-AntiAbuse: Original Domain - ops.ietf.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - jefsey.com X-Source: X-Source-Args: X-Source-Dir: Sender: owner-idn@ops.ietf.org Precedence: bulk X-Virus-Scanned: amavisd-new at alvestrand.no On 04:43 11/05/2005, Randy Presuhn said: >From: "JFC (Jefsey) Morfin" > > To: "Hallam-Baker, Phillip" > > Cc: > > Sent: Tuesday, May 10, 2005 5:29 PM > > Subject: RE: improving WG operation >... > > They do not not only delete. I suggest you just come to the WG-ltru where > > they have decided to document RFC 2277 charsets into RFC 3066 langtags. So > > you can enjoy charset conflicts, something you never though about, I > > presume. You cannot stop progress. >... > >I guess Jefsey is upset because the WG rejected his proposal >to expand our scope to include charsets. The ltru WG is most >emphatically *not* confusing charsets with language tags. I am not upset :-). To the countrary I find extremely interesting that some people were able to rename charsets "scripts" in order to insert charsets into languages descriptions while claiming they dont (cf. above). Obviously they are unhappy when I expose the trick. Anyway the result is great fun: people will be prevented from accessing a page they know to read, if they do not know the language. This cacologic however might be a good way to solve the IDN homograph issue and the phishing problem. If we revert from those famous "scripts" to what they are, i.e. unicode partitions, hence stable and well documented charsets (http://www.unicode.org/Public/4.1.0/ucd/Scripts.txt) , using them browsers can expose the homographs not related to the page charset in IDNs, and kill the risks of phishing. This only calls for the browsers to extract the charset, I mean the script name from the langtag, call this file, read the list of codes points in the charset/associated to the script, and display the URL accordingly, indicating the characters which are no part of the script/charset. This relieves the ccTLD/TLD Manager from responsibilities he cannot fulfil at 3+level. There are howver still (minor) points to address: - there are some minor disparities between the "script" name in the langtag, and the script name in the script.txt file should be reduced over time. I suppose that if this is a major issue, there will be help. - the script.txt file is currently supported on the Unicode site. Even in caching it (92 K) it will be called everytime people will start their browser. This may therefore represent several billions of access a day. - the WG-ltru only realy wants to address XML issues, related to old XML libraries. Some coordination with other WGs or interests could be fruitful. They plan the language tags registry to extend to scripts and to register them. I suppose other WGs could benefit from this (all those involved in a way or another with internationalisation and languages). jfc