Prohibiting mapping of PVALID characters

Kenneth Whistler kenw at sybase.com
Thu Dec 10 02:10:42 CET 2009


Paul,

> >Protocol, 5.2:
> >
> >5.2. Conversion to Unicode
> >
> >The string is converted from the local character set into
> >Unicode, if it is not already in Unicode. Depending on local
> >needs, this conversion may involve mapping some characters
> >into other characters as well as coding conversions...
> >The results MUST be a Unicode string in NFC form.
> >
> >
> >Strings don't magically get to be "in NFC form", without
> >being mapped (via normalization algorithm) from whatever form
> >they started out as, *into* NFC form.
> 
> In this case, yes they do. That "MUST" is probably wrong; I believe 
> that the statement is meant to say "The results will be a 
> Unicode string in NFC form".
> 
> John, et. al.: is my understanding correct here?

I repeat: strings do not turn into NFC form by magic. They
are mapped by the normalization algorithm. And that mapping
involves, necessarily, PVALID --> PVALID characters in
this case.

And "the results" will not be a "Unicode string in NFC form"
unless it is mapped to that. (Except of course, by accident,
if it happens to start out as NFC in the first place.)

I think what you may be trying to get at is a requirement that
the only valid *input* to the label processing is a string
which is already a Unicode string in NFC form -- and that
how it got to be that way and indeed whether it started out
as a string in SJIS or 8859-7 or whatever, is beyond the
protocol's scope of caring.

But if that is the case, then why are we talking about trying
to add into Section 5.2 a prohibition (a MUST NOT) against
mapping PVALID characters? Because manifestly, for any
actual implementation to meet the requirement of having
Unicode strings in NFC format as valid input for the label
processing, it MUST map strings (including PVALID
characters) using the Unicode normalization algorithm.

I understand that maybe you want to say that that isn't
a requirement *in* the protocol -- it is simply a requirement
on the well-formedness of valid input to be handled as
labels. And maybe I don't understand how you distribute
the MUSTs, SHOULDs and wills around the document to accomplish
that.

But my essential point here is that you cannot have you cake
and eat it too -- trying to prohibit mapping of PVALID
characters in Section 5.2 at the same time that you
are requiring mapping of PVALID characters in Section 5.2 --
however the exact protocol wordsmanship of that gets
worked out.

Maybe you can, for example, prohibit *case* mapping of PVALID
characters in Section 5.2, while requiring canonical
normalization mapping of PVALID characters to NFC. That
at least would be a coherent position. But you cannot
just prohibit mapping and require mapping of the same
class of characters, without being more discriminating
in what you mean.

--Ken




More information about the Idna-update mailing list