NFKC and dots

Erik van der Poel erikv at google.com
Sat Jan 5 17:48:36 CET 2008


On Dec 12, 2007 6:42 PM, Kenneth Whistler <kenw at sybase.com> wrote:
> If we had wanted to extend this set to all the compatibility NFKC variants,
> then we would also add the following:
>
> 2024  ONE DOT LEADER
> FE12  PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP
> FE52  SMALL FULL STOP
>
> However, there is no need for that at all, since those characters
> will not be entered in by accident on Chinese and Japanese computers.
>
> I'm agnostic about where FULL STOP and IDEOGRAPHIC FULL STOP
> equivalence get handled in the protocol stack, by the way.

Speaking of U+2024 and where in the protocol stack to handle things, I
just discovered that MSIE 7 and Firefox 2 both perform NFKC on this
character, to yield U+002E (.). After that, they divide the host name
into labels *again*, so the new U+002E becomes a new label separator.

If we ever get around to writing a document about IDNA in HTML, we may
want to make a note of this. I.e. the steps are:

(1) Divide the domain name into labels by looking for IDNA2003 dots.
(2) Perform Nameprep2003 on each non-ASCII label.
(3) Divide each label into multiple labels, by looking for regular dots.
(4) Perform Punycode2003 on each non-ASCII label.

Interestingly, Opera 9 appears to perform a slightly different set of
steps (see step 2):

(1) Divide the domain name into labels by looking for IDNA2003 dots.
(2) Perform Nameprep2003 on each non-ASCII label, and, if result is
non-ASCII, perform Punycode2003.
(3) Divide each label into multiple labels, by looking for regular dots.

Opera 9 is somewhat more conformant to RFC 3490, but it re-divides the
labels instead of inserting 0x2E (.) into the DNS packet. (One might
argue that RFC 3490 did not really take this into account.)

I haven't tried it in Safari 3 or MSIE 6 with Verisign plug-in. The
HTML I used for testing was:

<a href="http://google&#x2024;com">one</a><br>
<a href="http://&#x5341;&#x2024;com">two</a>

Erik


More information about the Idna-update mailing list