Heads-up on a possible Unicode 7.0.0 issue

John C Klensin klensin at jck.com
Mon Jul 21 21:38:50 CEST 2014


Just as a heads-up to those who are interested in IDNA protocol
issues but are not following Internet-Draft announcements or
some other IETF-related IDN work...

In general, we depend on Unicode normalization (and the IDNA
requirement that the putative labels be in NFC form before
processing) to be sure that two ways to represent the same
character (that is "same [abstract] character", not a question
about, e.g., visual similarities such as those between two
characters from different scripts).

A new precomposed character was added to 7.0.0 that could
previously by represented by a sequence of a base character and
a combining one.
   NFC (new character) .neq. NFC (combining sequence)   and
   NFD (new character) .neq. NFD (combining-sequnece)
Since there are important stability rules for normalization,
neither NFC nor NFD(combining sequence) can change because a new
character is added (and they haven't).   But, normally, when a
situation like this arises, the normalized forms of the new
character (both NFC and NFD) decompose it back to the earlier
combining sequence so that the results compare equal.

In this case (and a few others that no one noticed before) the
precomposed character does not decompose to the combining
sequence for reasons that, to the degree they can be summarized
in one line have to do with different uses of the character, not
its name or concrete or abstract forms.   

>From some quite reasonable perspectives, that makes a lot of
sense but it appears to us that for it is unfortunate and
dangerous for IDNA.   Consequently, the document announced below
makes the new character DISALLOWED in IDNA.

Whether that is the conclusion the IETF ultimately reaches
--there are some tradeoffs that the draft tries to identify-- it
is probably important to understand the fact that there are such
character combinations out there, preferably before the bad guys
figure it out.


---------- Forwarded Message ----------
Date: Monday, 21 July, 2014 04:03 -0700
From: internet-drafts at ietf.org
To: i-d-announce at ietf.org
Subject: I-D Action: draft-klensin-idna-5892upd-unicode70-00.txt

A New Internet-Draft is available from the on-line
Internet-Drafts directories.

        Title           : IDNA Update for Unicode 7.0.0
        Authors         : John C Klensin
                          Patrik Faltstrom
	Filename        : draft-klensin-idna-5892upd-unicode70-00.txt
	Pages           : 10
	Date            : 2014-07-21

   The current version of the IDNA specifications anticipated
that each    new version of Unicode would be reviewed to verify
that no changes    had been introduced that required adjustments
to the set of rules    and, in particular, whether new
exceptions or backward compatibility    adjustments were needed.
That review was conducted for Unicode 7.0.0    and identified a
problematic new code point.  This specification    updates RFC
5982 to disallow that code point and provides information
about the reasons why that exclusion is appropriate.  It also
applies    an editorial clarification that was the subject of an
earlier    erratum.

The IETF datatracker status page for this draft is:

There's also a htmlized version available at:

Please note that it may take a couple of minutes from the time
of submission until the htmlized version and diff are available
at tools.ietf.org.

Internet-Drafts are also available by anonymous FTP at:

I-D-Announce mailing list
I-D-Announce at ietf.org
Internet-Draft directories: http://www.ietf.org/shadow.html
or ftp://ftp.ietf.org/ietf/1shadow-sites.txt

---------- End Forwarded Message ----------

More information about the Idna-update mailing list