Q2: What mapping function should be used in a revised IDNA2008 specification?

Wed Apr 1 22:30:38 CEST 2009

--On Wednesday, April 01, 2009 10:38 -0700 Paul Hoffman
<phoffman at imc.org> wrote:

> At 4:29 PM +0900 4/1/09, Martin J. Dürst wrote:
>> My preference would be to use a significantly more restricted
>> set of mappings than for IDNA2003. At the very highest level,
>> the IDNA2003 mappings contained:
>> 1) Case mappings
>> 2) NFC mappings (canonical equivalence)
>> 3) NFKC mappings (compatibility equivalence)
>> 
>> I think the best thing would be to retain 1) and 2), but only
>> a very small part of 3). The reason for this is that 1) is
>> used as a parallel to the ASCII case equivalence in the ASCII
>> DNS, 2) is an inherent representational issue of an encoding
>> that (like Unicode) provides composing of accents and the
>> like, but 3) is a hodgepodge collection of various kinds of
>> equivalences.
> 
> I agree with this preference in principle, but as a practical
> matter, we are better off saying "use the full NFKC of the
> version of Unicode currently in use" rather than "use this
> often-changing case table or function, and use this
> often-changing canonical table or function, and use this
> often-changing compatibility table or function". The danger of
> saying "TUC defines NFKC for each version; use it" is
> approximately the same as "TUC updates TUS and we think that
> won't cause us to have to revise this RFC".

Of course, if we use mapping strictly to ensure IDNA2003
compatibility for reasonable cases, rather than making up new
ways to turn code points into other code points, that is a
non-issue because the table will be fixed once and never change.

> At 1:41 AM -0400 4/1/09, John C Klensin wrote:
>> Under no circumstances should mapping be used as a mechanism
>> for undoing those decisions, i.e., mapping should be
>> permitted only when the result is a PVALID character.
> 
> If you meant "PVALID, CONTEXTJ, or CONTEXT0", I agree.

While I could live with the broader definition, I note that none
of the characters that are now handled as CONTEXTJ or CONTEXTO
are now valid under IDNA2003.  I might have missed something,
but a scan of the UnicodeData table seems to indicate that none
of them are the targets of compatibility mapping either.  So,
because I still see the use of mapping as transitional, I
deliberately chose the narrower and less complex statement.

    john