Prohibiting mapping of PVALID characters

John C Klensin klensin at jck.com
Fri Dec 11 08:38:45 CET 2009



--On Thursday, December 10, 2009 22:58 +0000 Shawn Steele
<Shawn.Steele at microsoft.com> wrote:

> Some characters, or sequences of characters, in the input
> string may be mapped to others.  However, for any character
> that has a type of PVALID, such mapping MUST be limited to
> applying NFC, regardless of the local needs.

Just my opinion, but these are voluntary standards, not dictates
that are enforced by the well-equipped, well-staffed, and
well-armed protocol police.   Saying, in the same sentence "MUST
xxx, regardless of [the] local needs" is not only provocative,
it is pretty close to an invitation for someone to say "take
your MUST, and your standard, and do <something anatomically
unpleasant or impossible> with it".

This discussion is causing me to come around to the point of
view in Ken's initial comment on the subject: we aren't going to
get this right in a way that doesn't add to the confusion rather
than helping with it.

In particular...

--On Wednesday, December 09, 2009 17:10 -0800 Kenneth Whistler
<kenw at sybase.com> wrote:

>...
> I think what you may be trying to get at is a requirement that
> the only valid *input* to the label processing is a string
> which is already a Unicode string in NFC form -- and that
> how it got to be that way and indeed whether it started out
> as a string in SJIS or 8859-7 or whatever, is beyond the
> protocol's scope of caring.

And that, of course, is exactly the way Protocol is written
today, and deliberately so.   The text has been that way for a
_long_ time, precisely to avoid telling an implementation that
already knows something is in NFC form --or for which the odds
of its being in that form are such that it is easier and faster
to check than to convert-- that they need to convert.  One of
the specific examples of that is the one Ken suggests above: if
my input string is in 8859-whatever, it is really easy to write
a conversion procedure that is guaranteed to produce
NFC-compliant strings.  Having a standard that then requires
that I put them through NFC conversion is just silly.

I'm confident that, it is is really important enough, we can
revise the way the definition works enough to incorporate this
sort of prohibition (although I'd still argue rather strongly
against anything that requires saying "...regardless of local
needs".  But I think our experience has been that trying to make
such changes will have unexpected side-effects that will require
other changes and that it will consequently take a lot of time
and effort to get right.

If it were actually to turn out to be a better solution (I'm
trying to reserve judgment), we could pretty easily put a
discussion of the subject into Rationale --non-normative, but
persuasive of why remapping valid characters would be a bad
idea-- without having to muck with the protocol definition.

    john






More information about the Idna-update mailing list