I-D Action:draft-ietf-idnabis-mappings-00.txt

Sun Jun 7 22:49:48 CEST 2009

At 11:57 AM -0400 6/7/09, John C Klensin wrote:
>--On Saturday, June 06, 2009 16:38 -0700 Paul Hoffman
><phoffman at imc.org> wrote:
>
>>...
>>> I continue to believe that use of NKFC without exclusion of
>>> character groups for which there are no justifications is
>>
>> Pete's proposed mapping happens before the
>> is-it-valid-IDNA2008 check. Why should we use a modified NFKC
>> instead of plain-vanilla-NFKC and let the second step
>> (is-it-valid-IDNA2008 check) happen as-is?
>
>My concern is not those NFKC mappings that will result in
>invalid (DISALLOWED) characters.   It is
>
>(1) NFKC mappings of characters that, if used in domain names,
>are probably used to cause mischief and for which there is no
>substantive justification.   The "Mathematical" characters are
>examples of this.

I'm still confused. If someone enters a mathematical character that is mapped to a allowed character, the result is a valid domain name that could have been entered as allowed characters. This is identical to what we have today in IDNA2003, no worse.

> Martin's original list identified others.
>Note that, except in specialized systems, these characters are
>very difficult to type and ones for which fonts are unlikely to
>be present.

Yes, exactly. I'm still missing your point of concern.

>(2) NFKC mappings of characters that result in characters in
>CONTEXTO or CONTEXTJ.  Unless I missed something in my search,
>this is a null set at present.  But I can find no stability rule
>that would prevent adding such a character and the same
>presentation and ambiguity issues that apply to the listed
>CONTEXTx characters would apply to their compatibility
>equivalents.

And I don't see a problem with that. Someone enters an name-which-needs-mapping, it is mapped, and out pops some characters that are valid. How is this of more concern than a valid, no-mapping IDNA2008 name?

> >> (i) A violation of the "inclusion" model of IDNA2008
>>
>> Completely agree. However, this whole document is a violation
>> of the "no-mapping" model of IDNA2008, so that seems like an
>> odd objection.
>
>We are likely to have to agree to disagree about this, but I
>believe that "inclusion" and "no mapping" are separate
>principles.  The acceptance of mapping in some contexts does not
>seem to me to justify, in any way, abandoning "inclusion".  From
>that point of view, the argument to abandon inclusion in the
>mapping context has to be made separately... and that argument
>has not, as far as I can remember, been made yet.

We do agree to disagree here. Up to this point, we have held the whole set of rules constant, and some of them intertwined. I do *not* want us to abandon any of the rules in the end, just to allow sensible mapping before doing the final rule check.

> >> (ii) A violation of the closely-related protocol design
>>> principle that one should include only those things for which
>>> one has both use and understanding because it is easier to add
>>> later than it is to remove.
>>
>> Implementers of IDNA2008 will understand NFKC as well as
>> implementers of IDNA2003.
>
>Which is to say, IMO, not at all. 

And here we agree. Thus, I see no harm in extending the protocol. We have not seen any significant damage from the lack of understanding in IDNA2003 names, just as the lack of understanding of some crypto properties (for example) in developers hasn't had any significant negative effect on, say IPsec and TLS. And I really do see a parallel between the two types of protocols.

> >> (iii) An increased risk, however slight, that we will, in the
>>> future, get strong demands from some particular community to
>>> treat a character classified by Unicode as "compatibility" as
>>> a real and distinct character.  If such a character is
>>> disallowed by virtue of not being mapped, we will have the
>>> difficult problem of changing a disallowed character to a
>>> PVALID one. But, if it is mapped to something else, we will
> >> have to revisit the very complex discussions that we have had
>>> over Eszett and Final Sigma.  We should not incur that risk
>>> unless there is a reason to do so.
>>
>> There is: increased (but, of course, not full) backwards
>> compatibility with the large installed base of IDNA2003.
>
>We've seen no evidence that any of these other categories of
>compatibility characters are used --other than in possible
>demonstrations or out of malice-- in IDNA2003-compatible
>applications, much less enough to constitute a large installed
>base.  

Really? I thought that multiple presentations by Asians showed that some of the compatibility characters got entered automatically by the UIs of some browsers and so on. I could be mistaken, of course, but that is certainly what I interpret slide 3 of <http://www.ietf.org/proceedings/09mar/slides/idnabis-4.pdf>.

>YMMD.

As we age, all of our milages are degrading, yes. :-)