Early look at draft-idnabis-issues-00d
John C Klensin
klensin at jck.com
Mon Nov 6 17:45:45 CET 2006
--On Monday, November 06, 2006 17:39 +0100 Simon Josefsson
<jas at extundo.com> wrote:
> John C Klensin <klensin at jck.com> writes:
>> No, it is an essential contribution of the Unicode Consortium
>> (after some discussions while we were working on "nextsteps",
>> but entirely their idea. Basically, it differs from NFKC
>> (and Stable NFC from NFC, etc.) by causing an undefined code
>> point to fail, rather than translating to itself. The
>> existing (I probably shouldn't call them "unstable" versions
>> map unassigned code points into themselves. If one of those
>> code points is assigned later and normalizes to something
>> else, we have a problem... especially if the "assignment" was
>> due to registration by code developed under one version and
>> lookup under another.
> Ah, thanks, I found one reference for it:
> As far as I can tell, this is not yet in any official Unicode
> specification, but I may have missed it.
> It seems even unclear whether PR-95 is still an open issue or
> not within the UTC. The date on the first page seems rather
> old, but it says that all issues listed on the page are open.
> Ideally, this should be explained and a reference should be
UTC is meeting at the end of this week; I hope this will be
resolved then. In any event, I'll fix the references as you
>>> Second, a suggestion: discuss the move from Unicode 3.2 to
>>> Unicode 5.0 more prominently, and also the problems stemming
>>> from that. The added characters from Unicode 5.0 is a major
>>> new feature, so it should be more visible. There is one
>>> problem in handling the NFKC breakage that the UTC introduced
>>> after Unicode 3.2 -- the PR29 change -- but those strings can
>>> be detected and prevented by IDNA200x. I can describe how
>>> LibIDN does this separately, if there is interest in that
>> Probably a good idea. Will work on it this week. I think
>> it turns out that the "if it normalizes to something else, it
>> will normally be prohibited" rule prohibits all of the PR29
>> characters (with Unicode 3.2 due to one mapping and with
>> Unicode 4.0 and later with another, but who cares), so that
>> becomes a non-problem as an accidental consequence.
> Right. The PR29 problem is actually because the Unicode 3.2
> specification said one (incorrect) thing, which StringPrep
> inherited, and that today some implementations follow the
> specification and some implementations follow the
> specification as modified by PR29. Essentially PR29 changed
> NFKC in an backwards-incompatible way. The descriptions in
> Unicode 4.0 and later have been "corrected" to incorporate the
> PR29 change.
> I suggest that during the process of upgrading StringPrep from
> Unicode 3.2 to Unicode 5.0, it should document that all
> PR29-sequences should be rejected, to minimize the security
> problem that results from modifying the NFKC-algorithm in a
> backwards incompatible way.
That has been the intention. My careful language was only
because I haven't verified with the tables that this restriction
has been applied.
>>> PS. I posted this several weeks ago, but it didn't arrive in
>>> the archives, so it was probably filtered out.
>> I, at least, didn't get it. Too bad, as most of the specific
>> changes could have been dealt with in the posted -00.
> I should have cc'ed the email... I received a mailman bounce
> that said my message was in the moderator's queue, and should
> probably have re-sent the e-mail at that time, but since I was
> travelling, I forgot about it.
No problem. This document is probably several iterations short
More information about the Idna-update