prohibiting previously mapped and unmapped characters

Erik van der Poel erikv at google.com
Wed Nov 29 18:42:11 CET 2006


Hello everyone,

It's great to see so much energy in the idna200x efforts!

One of my concerns is that it may be too late to try to prohibit some
of the characters that were previously permitted by rfcs 349[0-2],
whether mapped or unmapped in the normalization and case-folding
processes. One example that comes to mind is the full-width latin
range U+FF01..5E and another is the cjk iteration mark U+3005.

http://www.ietf.org/internet-drafts/draft-faltstrom-idnabis-tables-01.txt

Some may decide, after a close reading, that the old rfcs do not allow
non-punycode domain names in html, but the fact of the matter is that
these do occur. Now that even the market-leading web browser (msie)
has a version out that supports these (v7), it may become increasingly
difficult to convince some implementors to prohibit characters that
actually occur in the wild.

http://www.majuro.jp/kaisya.html (3rd w in www.fwt.co.jp is full-width)

If it would help, I can take a look at Google's copies of web
documents to see which characters are actually used there and how many
occurrences there are of each. Of course, such a sample would omit
domain names used in email, but the web is quite an important part of
the Internet too.

Erik van der Poel
Google


More information about the Idna-update mailing list