prohibiting previously mapped and unmapped characters
mark.davis at icu-project.org
Wed Nov 29 19:03:16 CET 2006
I think one of the background assumptions for this effort is to focus on
identifying the allowed "output" characters, not the "input" characters.
That is, full width A-Z are already disallowed in the *output* of IDNA, so
this would have no change from that.
In retrospect, we really shouldn't have had the transformation embodied in
IDNA, just what can actually occur "on the wire".
On 11/29/06, Erik van der Poel <erikv at google.com> wrote:
> Hello everyone,
> It's great to see so much energy in the idna200x efforts!
> One of my concerns is that it may be too late to try to prohibit some
> of the characters that were previously permitted by rfcs 349[0-2],
> whether mapped or unmapped in the normalization and case-folding
> processes. One example that comes to mind is the full-width latin
> range U+FF01..5E and another is the cjk iteration mark U+3005.
> Some may decide, after a close reading, that the old rfcs do not allow
> non-punycode domain names in html, but the fact of the matter is that
> these do occur. Now that even the market-leading web browser (msie)
> has a version out that supports these (v7), it may become increasingly
> difficult to convince some implementors to prohibit characters that
> actually occur in the wild.
> http://www.majuro.jp/kaisya.html (3rd w in wwｗ.fwt.co.jp is full-width)
> If it would help, I can take a look at Google's copies of web
> documents to see which characters are actually used there and how many
> occurrences there are of each. Of course, such a sample would omit
> domain names used in email, but the web is quite an important part of
> the Internet too.
> Erik van der Poel
> Idna-update mailing list
> Idna-update at alvestrand.no
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update