[From nobody Wed Sep 16 04:53:46 2009
X-HELO: ehlo mail.nic.or.kr
X-RECEIVED-IP: 202.30.50.51
Received: from 202.30.50.51(202.30.50.51) at Sat, 12 Sep 2009 05:20:00 +0900
	by nida.or.kr with ESMTP CrediShield
X-MAIL-FROM: idna-update-bounces@alvestrand.no
Received: from nida.or.kr (mailgw.nic.or.kr [202.30.50.169])
	by mail.nic.or.kr (v3smtp 8.11.6.9/8.11.0) with SMTP id n8BKJf210847
	for &lt;yjsuh@nida.or.kr&gt;; Sat, 12 Sep 2009 05:19:41 +0900 (KST)
X-HELO: ehlo eikenes.alvestrand.no
X-RECEIVED-IP: 158.38.152.233
Received: from 158.38.152.233(158.38.152.233)
	at Sat, 12 Sep 2009 05:19:40 +0900
	by nida.or.kr with ESMTP CrediShield
X-MF: _EngSubj=-0.10
X-MAIL-FROM: idna-update-bounces@alvestrand.no
Received: from localhost (localhost [127.0.0.1])
	by eikenes.alvestrand.no (Postfix) with ESMTP id 022F639E12B;
	Fri, 11 Sep 2009 22:19:33 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at eikenes.alvestrand.no
Received: from eikenes.alvestrand.no ([127.0.0.1])
	by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id If2l81F28+yt; Fri, 11 Sep 2009 22:19:32 +0200 (CEST)
Received: from eikenes.alvestrand.no (localhost [127.0.0.1])
	by eikenes.alvestrand.no (Postfix) with ESMTP id 5F35E39E1BB;
	Fri, 11 Sep 2009 22:19:27 +0200 (CEST)
X-Original-To: idna-update@alvestrand.no
Delivered-To: idna-update@alvestrand.no
Received: from localhost (localhost [127.0.0.1])
	by eikenes.alvestrand.no (Postfix) with ESMTP id D5BCB39E177
	for &lt;idna-update@alvestrand.no&gt;; Fri, 11 Sep 2009 22:19:25 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at eikenes.alvestrand.no
Received: from eikenes.alvestrand.no ([127.0.0.1])
	by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id I2t8snDcauCK for &lt;idna-update@alvestrand.no&gt;;
	Fri, 11 Sep 2009 22:19:21 +0200 (CEST)
X-Greylist: delayed 00:08:17 by SQLgrey-1.6.8
Received: from QMTA10.emeryville.ca.mail.comcast.net
	(qmta10.emeryville.ca.mail.comcast.net [76.96.30.17])
	by eikenes.alvestrand.no (Postfix) with ESMTP id F1C8639E12B
	for &lt;idna-update@alvestrand.no&gt;; Fri, 11 Sep 2009 22:19:20 +0200 (CEST)
Received: from OMTA03.emeryville.ca.mail.comcast.net ([76.96.30.27])
	by QMTA10.emeryville.ca.mail.comcast.net with comcast
	id fVMR1c0040b6N64AAYB2Xg; Fri, 11 Sep 2009 20:11:02 +0000
Received: from nicemice.net ([67.170.13.46])
	by OMTA03.emeryville.ca.mail.comcast.net with comcast
	id fYB01c0030zd4JB8PYB1dL; Fri, 11 Sep 2009 20:11:02 +0000
Received: from amc by nicemice.net with local (Exim 4.69)
	(envelope-from &lt;return.amc+0+@nicemice.net&gt;)
	id 1MmCSQ-0007XZ-Vf; Fri, 11 Sep 2009 13:10:58 -0700
Date: Fri, 11 Sep 2009 20:10:58 +0000
From: &quot;Adam M. Costello&quot; &lt;idna-update.amc+0+@nicemice.net.RemoveThisWord&gt;
To: Martin =?iso-8859-1?Q?J=2E_D=FCrst?= &lt;duerst@it.aoyama.ac.jp&gt;
Subject: Re: Definitions limit on label length in UTF-8
Message-ID: &lt;20090911194239.7az~@nicemice.net&gt;
References: &lt;a617d24ceb7e3d66f9d6bd1ee7162e61@localhost.localdomain&gt;
	&lt;B60A675C-8A18-4F55-82BB-5DCB11236403@google.com&gt;
	&lt;4AA8D415.70207@it.aoyama.ac.jp&gt;
	&lt;20090910153142.GE53599@shinkuro.com&gt;
	&lt;B5882658E1993671ABAAAC42@PST.JCK.COM&gt;
	&lt;4AA9CFF8.8060605@it.aoyama.ac.jp&gt;
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: &lt;4AA9CFF8.8060605@it.aoyama.ac.jp&gt;
User-Agent: Mutt/1.5.18 (2008-05-17)
Cc: Andrew Sullivan &lt;ajs@shinkuro.com&gt;, idna-update@alvestrand.no,
	John C Klensin &lt;klensin@jck.com&gt;
X-BeenThere: idna-update@alvestrand.no
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: IDNA update work &lt;idna-update@alvestrand.no&gt;
List-Id: IDNA update work &lt;idna-update.alvestrand.no&gt;
List-Unsubscribe: &lt;http://www.alvestrand.no/mailman/listinfo/idna-update&gt;,
	&lt;mailto:idna-update-request@alvestrand.no?subject=unsubscribe&gt;
List-Archive: &lt;http://www.alvestrand.no/pipermail/idna-update&gt;
List-Post: &lt;mailto:idna-update@alvestrand.no&gt;
List-Help: &lt;mailto:idna-update-request@alvestrand.no?subject=help&gt;
List-Subscribe: &lt;http://www.alvestrand.no/mailman/listinfo/idna-update&gt;,
	&lt;mailto:idna-update-request@alvestrand.no?subject=subscribe&gt;
Content-Type: text/plain; charset=&quot;iso-8859-1&quot;
Content-Transfer-Encoding: quoted-printable
Sender: idna-update-bounces@alvestrand.no
Errors-To: idna-update-bounces@alvestrand.no

&quot;\&quot;Martin J. D=FCrst\&quot;&quot; &lt;duerst@it.aoyama.ac.jp&gt; wrote:

&gt; [Short summary: It's very easy to create UTF-8 strings that are longer
&gt; than punycode, for everything except US-ASCII. Remember, punycode was
&gt; *designed* to be efficient, in particular for domain name labels.]

Of the 11 languages I tried for my ACE evaluation, five were more
compact in Punycode than in UTF-8 for my example sentence (&quot;Why can't
they just speak &lt;language&gt;?&quot;).

Arabic:
  Punycode: xn--egbpdaj6bu4bxfgehfvwxn
     UTF-8: ??????????????????????????????????
Hebrew:
  Punycode: xn--4dbcagdahymbxekheh6e0a7fei0b
     UTF-8: ????????????????????????????????????????????
Hindi (Devanagari):
  Punycode: xn--i1baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd
     UTF-8: ???????????????????????????????????????????????????????????????=
???????????????????????????
Japanese (kanji and hiragana):
  Punycode: xn--n8jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa
     UTF-8: ??????????????????????????????????????????????????????
Russian:
  Punycode: xn--b1abfaaepdrnnbgefbaDotcwatmq2g4l
     UTF-8: ??????????????????????????????????????????????????????????

&gt; People not familiar with the history of the development of IDNA2003
&gt; should be aware of the fact that a lot of energy went into the
&gt; development of compression algorithms for domain names,

I can confirm that.  :)

&gt; The &quot;max 63 octets in UTF-8&quot; provision, unless removed, negates all
&gt; this effort.

Yeah, that would be a shame.

Since I haven't had time to participate in IDNA2008, maybe I haven't
earned the right to comment, but...

A real concern back then was that IDNA would be unfair to non-ASCII
scripts, because they couldn't fit nearly as much text in a domain
label.  Making it truly fair was never possible, but we did work very
hard to find an encoding that could squeeze as much non-ASCII text
as possible into a 63-byte ACE label.  If a 63-byte limit on UTF-8
forms is imposed, the complexity of Punycode is largely wasted and/or
misdirected; the encoding should have been designed to be just complex
enough to beat UTF-8, not to be as compact as possible.

Also, I agree with Martin's concern that adding a 63-byte limit on the
UTF-8 form of labels would have greater cost than benefit.  The cost is
breaking compatibility with names that are valid in IDNA2003.

AMC
_______________________________________________
Idna-update mailing list
Idna-update@alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update

]
