how did the idna theory start?

Tue Jul 3 07:06:15 CEST 2012

Hello John,

On 2012/07/02 21:17, John C Klensin wrote:

> In theory for the IDNA case, we could have used a prefix (to
> identify the permitted character and matching rules -- see
> Patrik's note) rather than a prefix and Punycode encoding
> followed by a UTF-8 string.  That would have no real advantage
> over IDNA and at least two disadvantages: some increased
> potential for BIDI confusion given the mixture of ASCII and
> other characters and worse encoding efficiency (one of the
> concerns discussed during IDNA development was the disadvantage
> in maximum label length imposed by UTF-8, especially for East
> Asian characters).

Compression was definitely a big issue, not the least because it was 
rather easy to measure and was fun to come up with compression 
algorithms designed for the task at hand (from this viewpoint, punycode 
is a real gem, although from a general usage point, it's an 
abomination). But the length restrictions aren't that much of an issue 
for East Asian characters. East Asian in general refers to Chinese, 
Japanese, Korean,... These languages use large sets of characters, and 
correspondingly few of these characters for names and other words. In 
UTF-8, they all use 3 bytes, and they won't compress very much in 
punycode because there can be large gaps between character numbers. In 
practice, the most disadvantaged scripts in UTF-8 are those with a small 
number of characters but where each character needs 3 bytes. This 
includes all Indic scripts (Devanagari,..., Tamil), South East Asian 
scripts (Thai,...) and a few others from around the world. There 
punycode really shows its strong sides.

Regards,   Martin.