prohibiting previously mapped and unmapped characters

Erik van der Poel erikv at google.com
Wed Nov 29 20:21:08 CET 2006


Some members of the design team may have made such assumptions, but I
only have the Internet Draft to look at:

http://www.ietf.org/internet-drafts/draft-klensin-idnabis-issues-00.txt

Note the "clicking on a URI" in section 2.2.1 and the "label
rejection" in 2.2.3. Also note the "No" next to FF00..EF on page 18
of:

http://www.ietf.org/internet-drafts/draft-faltstrom-idnabis-tables-01.txt

Am I misinterpreting the Internet Drafts? Also, I am not only
concerned about what can actually occur on the wire in a DNS packet. I
am also concerned about the html that goes on the wire. Am I alone in
this concern?

Erik

On 11/29/06, Mark Davis <mark.davis at icu-project.org> wrote:
> I think one of the background assumptions for this effort is to focus on
> identifying the allowed "output" characters, not the "input" characters.
> That is, full width A-Z are already disallowed in the *output* of IDNA, so
> this would have no change from that.
>
> In retrospect, we really shouldn't have had the transformation embodied in
> IDNA, just what can actually occur "on the wire".
>
> Mark
>
>
> On 11/29/06, Erik van der Poel <erikv at google.com> wrote:
> >
> > Hello everyone,
> >
> > It's great to see so much energy in the idna200x efforts!
> >
> > One of my concerns is that it may be too late to try to prohibit some
> > of the characters that were previously permitted by rfcs 349[0-2],
> > whether mapped or unmapped in the normalization and case-folding
> > processes. One example that comes to mind is the full-width latin
> > range U+FF01..5E and another is the cjk iteration mark U+3005.
> >
> >
> http://www.ietf.org/internet-drafts/draft-faltstrom-idnabis-tables-01.txt
> >
> > Some may decide, after a close reading, that the old rfcs do not allow
> > non-punycode domain names in html, but the fact of the matter is that
> > these do occur. Now that even the market-leading web browser (msie)
> > has a version out that supports these (v7), it may become increasingly
> > difficult to convince some implementors to prohibit characters that
> > actually occur in the wild.
> >
> > http://www.majuro.jp/kaisya.html (3rd w in www.fwt.co.jp
> is full-width)
> >
> > If it would help, I can take a look at Google's copies of web
> > documents to see which characters are actually used there and how many
> > occurrences there are of each. Of course, such a sample would omit
> > domain names used in email, but the web is quite an important part of
> > the Internet too.
> >
> > Erik van der Poel
> > Google
> > _______________________________________________
> > Idna-update mailing list
> > Idna-update at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/idna-update
> >
>
>


More information about the Idna-update mailing list