IDNA2008 test vectors
simon at josefsson.org
Tue Mar 29 22:18:23 CEST 2011
Mark Davis ☕ <mark at macchiato.com> writes:
> That looks like a bug, I'll check it out.
There may be a couple of these. Of course, this could very well be a
misunderstanding of the file format on my part. The lines my code
reject are (the first one is the same as in my previous e-mail):
B; 。; ; ;
B; \uDB40\uDDAA; ; ;
B; \uDB40\uDD3A; ; ;
B; \uDB40\uDD35; ; ;
These all have the "interesting" property that the ToASCII value
> The ideographic period is allowed under IDNA2003, but should be mapping to
I don't do mapping.
> (Also, in the next revision, there will be a field that indicates that
> the input isn't allowed under IDNA2008, so that people can distinguish
Isn't that what the NV8 field is there for already?
> *— Il meglio è l’inimico del bene —*
> On Tue, Mar 29, 2011 at 08:33, Simon Josefsson <simon at josefsson.org> wrote:
>> Hi Mark,
>> I'm happy to report that libidn2 handles 116 of the positive test
>> vectors in http://www.unicode.org/Public/idna/6.0.1/IdnaTest.txt dated
>> 29-dec-2010 with SHA-1 2fb11ede408fe7ab3e1c3b071d8c9c3f0de0d1fc.
>> Testing all negative test vectors (i.e., test vectors that fail) is more
>> cumbersome but I'll try to figure something out.
>> I'm now going through the remaining positive test vectors that failed
>> for some reason, and one of them that cought my eye is below.
>> Line 2387 of IdnaTest.txt reads:
>> B; 。; ; ;
>> To me this means that the source input is U+3002, ToUnicode output is
>> U+3002, and ToASCII output is U+3002. It seems weird that the ToASCII
>> output is a Unicode string and not an ACE string?!
>> According to RFC 5892 that code point is disallowed:
>> 3000..3004 ; DISALLOWED # IDEOGRAPHIC SPACE..JAPANESE INDUSTRIAL STAND
>> Is this a bug in IdnaTest.txt?
>> Idna-update mailing list
>> Idna-update at alvestrand.no
More information about the Idna-update