I-D Action:draft-ietf-idnabis-mappings-00.txt

Sun May 31 01:02:34 CEST 2009

Mark

On Sat, May 30, 2009 at 15:10, John C Klensin <klensin at jck.com> wrote:

>
>
> --On Saturday, May 30, 2009 11:19 -0700 Paul Hoffman
> <phoffman at imc.org> wrote:
>
> > I earlier criticized the -00 draft for not being technically
> > correct. Here are my proposed changes for that. Basically,
> > replace "NFC and some NFKC-like actions but avoid touching the
> > characters that are allowed in IDNA2008 but weren't allowed in
> > IDNA2003" with "protect the characters that are allowed in
> > IDNA2008 but weren't allowed in IDNA2003, then do NFKC".
>
> Paul,
>
> I think there is at least one fundamental difference here, but
> it may be accidental so I should ask and see if it brings us a
> little closer.
>
> There have been several comments on the list (I think first from
> Martin) to the effect that there are compatibility mappings
> supported as part of NFKC that we fundamentally don't need.  For
> example, the many "mathematical" characters that are essentially
> font variations on undecorated Latin ones almost certainly have
> no appropriate place in domain names: they are hard or difficult
> to type in most environments, they don't exist as distinct
> glyphs in most of the fonts one would expect to see in URLs,
> etc.  If they have any practical value at all, it would be to do
> mischief.
>
> The glyph problem adds to the risk that a user will see little
> boxes or question marks, naively proceed, and discover that they
> masked a known-hostile domain.   So the risk isn't entirely
> theoretical.

How would the example you cite be different from my seeing a box standing
for a Malayalam character, where I have no Malayalam font? I don't think
that you've given any concrete cases -- so at this point, you have not
established that it is a practical problem, nor which characters would be at
issue, nor what the magnitude is. So I think it is, at this point, pretty
much theoretical.

>
> In the data that Erik and Mark have provided from time to time,
> I don't believe we saw any evidence that those characters are in
> actual, practical, use in, e.g., URLs.

It's been on my plate for some time to do a thorough analysis. (That plate's
been pretty full lately.) I agree that if some set of characters simply are
not in current use that there is no need to map them for compatibility; but
we have to be very sure of what that set actually is. Your assumption has
been case mapping + width, but we need actual data.

>
>
> On that basis, Martin (and others) have suggested that we be
> selective about the compatibility mappings we support, focusing
> on those that can be typed and have some useful effect and
> avoiding those that, like the above, serve no purpose other than
> possibly to increase risk.
>
> Do you believe there is actually a reason to map those
> characters, as the way you suggest applying NFKC would do?  And,
> if so, can you explain why?
>
>     john
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090530/44c275c2/attachment.htm