NFKC and dots
Behnam ZWNJ Esfahbod
behnam at zwnj.org
Sun Jan 6 13:28:11 CET 2008
> On Dec 12, 2007 6:42 PM, Kenneth Whistler <kenw at sybase.com> wrote:
> > If we had wanted to extend this set to all the compatibility NFKC variants,
> > then we would also add the following:
> > 2024 ONE DOT LEADER
> > FE12 PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP
> > FE52 SMALL FULL STOP
> > However, there is no need for that at all, since those characters
> > will not be entered in by accident on Chinese and Japanese computers.
> > I'm agnostic about where FULL STOP and IDEOGRAPHIC FULL STOP
> > equivalence get handled in the protocol stack, by the way.
> Speaking of U+2024 and where in the protocol stack to handle things, I
> just discovered that MSIE 7 and Firefox 2 both perform NFKC on this
> character, to yield U+002E (.). After that, they divide the host name
> into labels *again*, so the new U+002E becomes a new label separator.
> If we ever get around to writing a document about IDNA in HTML, we may
> want to make a note of this. I.e. the steps are:
> (1) Divide the domain name into labels by looking for IDNA2003 dots.
> (2) Perform Nameprep2003 on each non-ASCII label.
> (3) Divide each label into multiple labels, by looking for regular dots.
> (4) Perform Punycode2003 on each non-ASCII label.
Why not? That's better if we can use IDEOGRAPHIC FULL STOP in text,
and copy/pasting to the address bar works as well. As far as these
characters don't have any other usage in a address bar (they are
periods, like the ASCII one, right?) that's better to convert them to
the ASCII one.
' بهنام اسفهبد
' Behnam Esfahbod
* .. http://behnam.esfahbod.info
* ` *
* o * http://zwnj.org
More information about the Idna-update