Possible definition for MVALID and a mapping table

John C Klensin klensin at jck.com
Sun Apr 12 02:15:21 CEST 2009


Hi.

Independent of the terminology, skipping the M-Label,
MVALID/MSUBJECT debate, and ignoring the location of any mapping
vis-a-vis the core IDNA protocol for the present, I've been
trying to think about mapping functions and tables.   FWIW, I
get more comfortable about the idea the closer we stick to the
"inclusion principle", rather than saying, e.g., "NFKC less
exceptions".  The latter seems to take us all the way back to
the IDNA2003 way of thinking, which is what we told the
community we want to move away from.

The discussion below draws heavily on the exchange between
Martin and Erik somewhat over a week ago which, for some reason,
seems to have gotten lost in the noise.

I hope we can formulate this as rules that generate tables, but
I'd like to see if we can agree on principles before we get down
to details.  Principles:

(1) No character is mapped if it would map to a DISALLOWED
character (I think we are agreed about that one).

(2) Only those NFKC mappings that are identified in UnicodeData
as <wide> or <narrow> are automatically included.  <compat> will
have to be considered on a case-by-case basis or with
discrimination based on other rules.  There appear to be 673
characters (fortunately quite a few less once rule (1) is
applied) in that group in Unicode 5.1, so I certainly hope we
can come up with a better discrimination function.

(3) Of the case-related operations, only toLowerCase is used to
form mapping functions.  The additional cases that result in
    toLowerCase(cp) <> toCaseFold(cp)
are all potentially problematic and, if they are to be included
(mapped), require case-by-case consideration.

I think this conforms to Mark's "it maps the same way IDNA2003
does or it doesn't map at all" rule expanded slightly to include
newly-added characters and contracted quite a lot to eliminate
mapping that, if used, are likely to be nothing but invitations
to trouble.

Is that a useful start?

   john



More information about the Idna-update mailing list