NFKC and dots

Simon Josefsson simon at josefsson.org
Mon Mar 3 09:00:37 CET 2008


I think such a document could update IDNA2003, or at least provide
informational documentation on how parts of the community have chosen to
implement IDNA2003 instead.  As far as I understand, this relates to all
user entered hostnames, and is not restricted to HTML.

For reference and an example of a confusing strings, see:
http://josefsson.org/idn.php/?data=%E5%8D%81%E2%80%A4com&profile=Nameprep&mode=toascii&charset=UTF-8&lastcharset=UTF-8

I'm told both MSIE and Firefox does not yield the same IDN as the
correct xn--.com-pg0g here.  Arguable the MSIE/Firefox behaviour is more
reasonable.

/Simon

Martin Duerst <duerst at it.aoyama.ac.jp> writes:

> I think that some aspects of this may be related to HTML.
> But domain names are used much more widely than HTML, and
> it would be a bad idea to have HTML behave differently from
> other, similar formats. As far as IDNA2003 did lead to
> unintuitive or clearly underspecified behavior for
> generic (from an IDN viewpoint) "higher-level protocols",
> it should be fixed and the fix documented in IDNAbis.
> These considerations are crossing label boundaries, but
> then so do bidi considerations. Although wherever
> possible, we should limit IDN work to single-label
> considerations, cross-label issues are sometimes
> unavoidable.
>
> Regards,    Martin.
>
> At 15:42 08/03/03, Simon Josefsson wrote:
>>"Erik van der Poel" <erikv at google.com> writes:
>>
>>> Hi Shawn,
>>>
>>> Thanks for the info. After I sent that email, I discussed it with some
>>> of the ICU folks, and they also said that one way to do this would be
>>> to perform NFKC on the entire domain name before splitting it into
>>> labels. Mark's pre-processing draft says something similar:
>>>
>>> http://docs.google.com/Doc?id=dfqr8rd5_51c3nrskcx&pli=1
>>>
>>> Actually, I've been meaning to gather folks who are interested in HTML
>>> and IDNA so that we can discuss this pre-processing spec. However, I
>>> do not want to distract the nascent working group, which probably
>>> wants to focus on the on-the-wire specs (IDNA200X, 4 drafts: issues,
>>> protocol, tables and bidi).
>>
>>For what it's worth, I'm interested in seeing the work-around
>>documented.  Old IDNA behaviour is unintuitive here.
>>
>>/Simon
>>_______________________________________________
>>Idna-update mailing list
>>Idna-update at alvestrand.no
>>http://www.alvestrand.no/mailman/listinfo/idna-update
>
>
> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     


More information about the Idna-update mailing list