Early look at draft-idnabis-issues-00d

Mon Nov 6 17:39:32 CET 2006

John C Klensin <klensin at jck.com> writes:

> --On Monday, November 06, 2006 15:56 +0100 Simon Josefsson
> <jas at extundo.com> wrote:
>
>> Hi!
>>
>> I'm still digesting this document...
>>
>> First, just a question: what is "Stable NFKC"??  Any
>> reference? It seems like this will be the essential
>> contribution of IDNA200x.
>
> No, it is an essential contribution of the Unicode Consortium (after
> some discussions while we were working on "nextsteps", but entirely
> their idea.  Basically, it differs from NFKC (and Stable NFC from NFC,
> etc.) by causing an undefined code point to fail, rather than
> translating to itself.   The existing (I probably shouldn't call them
> "unstable" versions map unassigned code points into themselves.  If
> one of those code points is assigned later and normalizes to something
> else, we have a problem... especially if the "assignment" was due to
> registration by code developed under one version and lookup under
> another.

Ah, thanks, I found one reference for it:

http://www.unicode.org/review/
http://www.unicode.org/review/pr-95.html

As far as I can tell, this is not yet in any official Unicode
specification, but I may have missed it.

It seems even unclear whether PR-95 is still an open issue or not
within the UTC.  The date on the first page seems rather old, but it
says that all issues listed on the page are open.

Ideally, this should be explained and a reference should be given.

>> Second, a suggestion: discuss the move from Unicode 3.2 to
>> Unicode 5.0 more prominently, and also the problems stemming
>> from that.  The added characters from Unicode 5.0 is a major
>> new feature, so it should be more visible.  There is one
>> problem in handling the NFKC breakage that the UTC introduced
>> after Unicode 3.2 -- the PR29 change -- but those strings can
>> be detected and prevented by IDNA200x.  I can describe how
>> LibIDN does this separately, if there is interest in that
>> approach.
>
> Probably a good idea.  Will work on it this week.   I think it turns
> out that the "if it normalizes to something else, it will normally be
> prohibited" rule prohibits all of the PR29 characters (with Unicode
> 3.2 due to one mapping and with Unicode 4.0 and later with another,
> but who cares), so that becomes a non-problem as an accidental
> consequence.

Right.  The PR29 problem is actually because the Unicode 3.2
specification said one (incorrect) thing, which StringPrep inherited,
and that today some implementations follow the specification and some
implementations follow the specification as modified by PR29.
Essentially PR29 changed NFKC in an backwards-incompatible way.  The
descriptions in Unicode 4.0 and later have been "corrected" to
incorporate the PR29 change.

Because the PR29 strings are inherently insecure under NFKC, LibIDN
have a module that can be used to detect such sequences.  LibIDN
implements the algorithm described in (look for table 2):

http://www.unicode.org/review/pr-29.html

We use the module in security sensitive code, such as my SASL library.

The code to do these checks are in:

http://josefsson.org/cgi-bin/viewcvs.cgi/libidn/lib/pr29.c?view=markup

I suggest that during the process of upgrading StringPrep from Unicode
3.2 to Unicode 5.0, it should document that all PR29-sequences should
be rejected, to minimize the security problem that results from
modifying the NFKC-algorithm in a backwards incompatible way.

>> PS.  I posted this several weeks ago, but it didn't arrive in
>> the archives, so it was probably filtered out.
>
> I, at least, didn't get it.  Too bad, as most of the specific changes
> could have been dealt with in the posted -00.

I should have cc'ed the email... I received a mailman bounce that said
my message was in the moderator's queue, and should probably have
re-sent the e-mail at that time, but since I was travelling, I forgot
about it.

/Simon