Early look at draft-idnabis-issues-00d

Mon Nov 6 17:45:45 CET 2006

--On Monday, November 06, 2006 17:39 +0100 Simon Josefsson 
<jas at extundo.com> wrote:

> John C Klensin <klensin at jck.com> writes:
>...
>> No, it is an essential contribution of the Unicode Consortium
>> (after some discussions while we were working on "nextsteps",
>> but entirely their idea.  Basically, it differs from NFKC
>> (and Stable NFC from NFC, etc.) by causing an undefined code
>> point to fail, rather than translating to itself.   The
>> existing (I probably shouldn't call them "unstable" versions
>> map unassigned code points into themselves.  If one of those
>> code points is assigned later and normalizes to something
>> else, we have a problem... especially if the "assignment" was
>> due to registration by code developed under one version and
>> lookup under another.
>
> Ah, thanks, I found one reference for it:
>
> http://www.unicode.org/review/
> http://www.unicode.org/review/pr-95.html
>
> As far as I can tell, this is not yet in any official Unicode
> specification, but I may have missed it.
>
> It seems even unclear whether PR-95 is still an open issue or
> not within the UTC.  The date on the first page seems rather
> old, but it says that all issues listed on the page are open.
>
> Ideally, this should be explained and a reference should be
> given.

UTC is meeting at the end of this week; I hope this will be 
resolved then.  In any event, I'll fix the references as you 
suggest.

>>> Second, a suggestion: discuss the move from Unicode 3.2 to
>>> Unicode 5.0 more prominently, and also the problems stemming
>>> from that.  The added characters from Unicode 5.0 is a major
>>> new feature, so it should be more visible.  There is one
>>> problem in handling the NFKC breakage that the UTC introduced
>>> after Unicode 3.2 -- the PR29 change -- but those strings can
>>> be detected and prevented by IDNA200x.  I can describe how
>>> LibIDN does this separately, if there is interest in that
>>> approach.
>>
>> Probably a good idea.  Will work on it this week.   I think
>> it turns out that the "if it normalizes to something else, it
>> will normally be prohibited" rule prohibits all of the PR29
>> characters (with Unicode 3.2 due to one mapping and with
>> Unicode 4.0 and later with another, but who cares), so that
>> becomes a non-problem as an accidental consequence.
>
> Right.  The PR29 problem is actually because the Unicode 3.2
> specification said one (incorrect) thing, which StringPrep
> inherited, and that today some implementations follow the
> specification and some implementations follow the
> specification as modified by PR29. Essentially PR29 changed
> NFKC in an backwards-incompatible way.  The descriptions in
> Unicode 4.0 and later have been "corrected" to incorporate the
> PR29 change.
>...

> I suggest that during the process of upgrading StringPrep from
> Unicode 3.2 to Unicode 5.0, it should document that all
> PR29-sequences should be rejected, to minimize the security
> problem that results from modifying the NFKC-algorithm in a
> backwards incompatible way.

That has been the intention.  My careful language was only 
because I haven't verified with the tables that this restriction 
has been applied.

>>> PS.  I posted this several weeks ago, but it didn't arrive in
>>> the archives, so it was probably filtered out.
>>
>> I, at least, didn't get it.  Too bad, as most of the specific
>> changes could have been dealt with in the posted -00.
>
> I should have cc'ed the email... I received a mailman bounce
> that said my message was in the moderator's queue, and should
> probably have re-sent the e-mail at that time, but since I was
> travelling, I forgot about it.

No problem.  This document is probably several iterations short 
of final.

    john