NFKC and dots

Behnam ZWNJ Esfahbod behnam at zwnj.org
Sun Jan 6 13:28:11 CET 2008


Hi there,

> On Dec 12, 2007 6:42 PM, Kenneth Whistler <kenw at sybase.com> wrote:
> > If we had wanted to extend this set to all the compatibility NFKC variants,
> > then we would also add the following:
> >
> > 2024  ONE DOT LEADER
> > FE12  PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP
> > FE52  SMALL FULL STOP
> >
> > However, there is no need for that at all, since those characters
> > will not be entered in by accident on Chinese and Japanese computers.
> >
> > I'm agnostic about where FULL STOP and IDEOGRAPHIC FULL STOP
> > equivalence get handled in the protocol stack, by the way.
>
> Speaking of U+2024 and where in the protocol stack to handle things, I
> just discovered that MSIE 7 and Firefox 2 both perform NFKC on this
> character, to yield U+002E (.). After that, they divide the host name
> into labels *again*, so the new U+002E becomes a new label separator.
>
> If we ever get around to writing a document about IDNA in HTML, we may
> want to make a note of this. I.e. the steps are:
>
> (1) Divide the domain name into labels by looking for IDNA2003 dots.
> (2) Perform Nameprep2003 on each non-ASCII label.
> (3) Divide each label into multiple labels, by looking for regular dots.
> (4) Perform Punycode2003 on each non-ASCII label.

Why not?  That's better if we can use IDEOGRAPHIC FULL STOP in text,
and copy/pasting to the address bar works as well.  As far as these
characters don't have any other usage in a address bar (they are
periods, like the ASCII one, right?) that's better to convert them to
the ASCII one.


-- 
    '     بهنام اسفهبد
    '     Behnam Esfahbod
   '
  *  ..   http://behnam.esfahbod.info
 *  `  *
  * o *   http://zwnj.org


More information about the Idna-update mailing list