Lower casing

Mark Davis ☕ mark at macchiato.com
Thu Jan 27 17:56:17 CET 2011


Mark

*— Il meglio è l’inimico del bene —*


On Thu, Jan 27, 2011 at 03:19, Simon Josefsson <simon at josefsson.org> wrote:

> Mark Davis ☕ <mark at macchiato.com> writes:
>
> >> Thank you, I'm now going through these against my implementation.
> >> However, shouldn't I also ignore the toUnicode column for all B tests?
> >>
> >
> > The B lines are valid for both T and N, so you should include them.
>
> Then I'm stuck, and I would appreciate clarification from everyone about
> what IDNA2008 is saying.  Your second test case is:
>
> B;      FASS.DE;        fass.de;        ;
>
> The only place (that I can find) where IDNA2008 converts to lower case
> is in the following paragraph:
>

Sorry I wasn't not clear; The second column is the "Source", which you need
to ignore since you are not mapping.

Let's take a couple of lines:

N;	Faß.de;	faß.de <http://fass.de>;	xn--fa-hia.de;	

You would only look at the 3rd and 4th columns for testing, so
faß.de<http://fass.de>
and xn--fa-hia.de.

B;	à.\u05D0\u0308;	;	xn--0ca.xn--ssa73l

Logically, you only look at the 3rd and 4th columns here as well. However,
blank columns just mean that the contents are the same (suppressed for space
and readability), so the fully-fleshed out lines would be:

B;	à.\u05D0\u0308;	à.\u05D0\u0308;	xn--0ca.xn--ssa73l

If you think this convention is more trouble than it is worth, let us know.
For the line that you list

B;      FASS.DE <http://fass.de/>;        fass.de;        ;

it would go to

B;      FASS.DE <http://fass.de/>;        fass.de;
FASS.DE<http://fass.de/>
;

I'll have to look at that particular instance to see why the casing looks
odd in column 4.

The other take-away I have from this is that we need to have clearer
instructions for those who want to use the file, and that do not support
mapping. I'll add that to a list of feedback for the committee.


>   5.3.  A-label Input
>
>   If the input to this procedure appears to be an A-label (i.e., it
>   starts in "xn--", interpreted case-insensitively), the lookup
>   application MAY attempt to convert it to a U-label, first ensuring
>   that the A-label is entirely in lowercase (converting it to lowercase
>   if necessary), and apply the tests of Section 5.4 and the conversion
>   of Section 5.5 to that form.  If the label is converted to Unicode
>   (i.e., to U-label form) using the Punycode decoding algorithm, then
>   the processing specified in those two sections MUST be performed, and
>   the label MUST be rejected if the resulting label is not identical to
>   the original.  See Section 8.1 of the Rationale document [RFC5894]
>   for additional discussion on this topic.
>
> However "FASS.DE" is not an A-label.
>
> Is there anything else in IDNA2008 that lower case labels?
>
> Further, I don't interpret the above to cause case conversion of the
> string looked up in DNS.  I only interprete it as converting it to lower
> case for the purpose of comparing with the output from the Punycode
> output.  It would be nice if someone could confirm or reject this
> interpretation as well.
>
> Since your test vectors aren't written in the form of test vectors for
> IDNA2008, it is difficult for me to understand whether this is caused by
> something in TR46 (which I don't implement) or just an misunderstanding
> on my or your side.
>
> /Simon
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20110127/a7edba25/attachment.html>


More information about the Idna-update mailing list