how did the idna theory start?
"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Tue Jul 3 07:06:15 CEST 2012
Hello John,
On 2012/07/02 21:17, John C Klensin wrote:
> In theory for the IDNA case, we could have used a prefix (to
> identify the permitted character and matching rules -- see
> Patrik's note) rather than a prefix and Punycode encoding
> followed by a UTF-8 string. That would have no real advantage
> over IDNA and at least two disadvantages: some increased
> potential for BIDI confusion given the mixture of ASCII and
> other characters and worse encoding efficiency (one of the
> concerns discussed during IDNA development was the disadvantage
> in maximum label length imposed by UTF-8, especially for East
> Asian characters).
Compression was definitely a big issue, not the least because it was
rather easy to measure and was fun to come up with compression
algorithms designed for the task at hand (from this viewpoint, punycode
is a real gem, although from a general usage point, it's an
abomination). But the length restrictions aren't that much of an issue
for East Asian characters. East Asian in general refers to Chinese,
Japanese, Korean,... These languages use large sets of characters, and
correspondingly few of these characters for names and other words. In
UTF-8, they all use 3 bytes, and they won't compress very much in
punycode because there can be large gaps between character numbers. In
practice, the most disadvantaged scripts in UTF-8 are those with a small
number of characters but where each character needs 3 bytes. This
includes all Indic scripts (Devanagari,..., Tamil), South East Asian
scripts (Thai,...) and a few others from around the world. There
punycode really shows its strong sides.
Regards, Martin.
More information about the Idna-update
mailing list