IDNA2008 test vectors

Mark Davis ☕ mark at macchiato.com
Tue Mar 29 20:54:30 CEST 2011


That looks like a bug, I'll check it out.

The ideographic period is allowed under IDNA2003, but should be mapping to
".". (Also, in the next revision, there will be a field that indicates that
the input isn't allowed under IDNA2008, so that people can distinguish them.

Mark

*— Il meglio è l’inimico del bene —*


On Tue, Mar 29, 2011 at 08:33, Simon Josefsson <simon at josefsson.org> wrote:

> Hi Mark,
>
> I'm happy to report that libidn2 handles 116 of the positive test
> vectors in http://www.unicode.org/Public/idna/6.0.1/IdnaTest.txt dated
> 29-dec-2010 with SHA-1 2fb11ede408fe7ab3e1c3b071d8c9c3f0de0d1fc.
>
> Testing all negative test vectors (i.e., test vectors that fail) is more
> cumbersome but I'll try to figure something out.
>
> I'm now going through the remaining positive test vectors that failed
> for some reason, and one of them that cought my eye is below.
>
> Line 2387 of IdnaTest.txt reads:
>
> B;      。;      ;       ;
>
> To me this means that the source input is U+3002, ToUnicode output is
> U+3002, and ToASCII output is U+3002.  It seems weird that the ToASCII
> output is a Unicode string and not an ACE string?!
>
> According to RFC 5892 that code point is disallowed:
>
> 3000..3004  ; DISALLOWED  # IDEOGRAPHIC SPACE..JAPANESE INDUSTRIAL STAND
>
> Is this a bug in IdnaTest.txt?
>
> Cheers,
> /Simon
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20110329/77056cf4/attachment.html>


More information about the Idna-update mailing list