my digression on UTF-8 - was Changing the xn-- prefix -

Martin Duerst duerst at it.aoyama.ac.jp
Tue Mar 25 06:47:29 CET 2008


At 08:22 08/03/25, Shawn Steele wrote:

>> (iii)  While it is efficient for ASCII and most western/northern
>alphabets, UTF-8 is arguably pathological for East Asian scripts
>
>That's a common argument, but it doesn't stand up.

Yes indeed. A lot of people get this wrong, repeatedly. For East Asia
(China, Japan, Korea), it's a change from two bytes per character
to three bytes per character, where a character is a syllable or so,
and significantly less characters are used on average for the same
text.

The problem is much of the rest of Asia, starting with India.
All the essentially alphabetic scripts of India and South (East)
Asia, plus a few other scattered around the world, require 3 bytes
per character where in a script-specific encoding, they could easily
have been coded with 1 byte per character.

Regards,    Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     



More information about the Idna-update mailing list