Early look at draft-idnabis-issues-00d

Mon Nov 6 16:08:53 CET 2006

--On Monday, November 06, 2006 15:56 +0100 Simon Josefsson 
<jas at extundo.com> wrote:

> Hi!
>
> I'm still digesting this document...
>
> First, just a question: what is "Stable NFKC"??  Any
> reference? It seems like this will be the essential
> contribution of IDNA200x.

No, it is an essential contribution of the Unicode Consortium 
(after some discussions while we were working on "nextsteps", 
but entirely their idea.  Basically, it differs from NFKC (and 
Stable NFC from NFC, etc.) by causing an undefined code point to 
fail, rather than translating to itself.   The existing (I 
probably shouldn't call them "unstable" versions map unassigned 
code points into themselves.  If one of those code points is 
assigned later and normalizes to something else, we have a 
problem... especially if the "assignment" was due to 
registration by code developed under one version and lookup 
under another.

> Second, a suggestion: discuss the move from Unicode 3.2 to
> Unicode 5.0 more prominently, and also the problems stemming
> from that.  The added characters from Unicode 5.0 is a major
> new feature, so it should be more visible.  There is one
> problem in handling the NFKC breakage that the UTC introduced
> after Unicode 3.2 -- the PR29 change -- but those strings can
> be detected and prevented by IDNA200x.  I can describe how
> LibIDN does this separately, if there is interest in that
> approach.

Probably a good idea.  Will work on it this week.   I think it 
turns out that the "if it normalizes to something else, it will 
normally be prohibited" rule prohibits all of the PR29 
characters (with Unicode 3.2 due to one mapping and with Unicode 
4.0 and later with another, but who cares), so that becomes a 
non-problem as an accidental consequence.
>
> While reading the document, I've noticed some areas that could
> improve the document:
>
> The IDNA model flow in section 2 should be improved to make it
> clear that all-ASCII inputs, and some Unicode input strings,
> are converted to ASCII hostnames in the DNS.  In other words,
> at least with IDNA2003, not all inputs generate a punycode'd
> output string.  The section currently gives the impression
> that all strings are punycode encoded; I suspect this is just
> sloppy use of terminology.

Yep.  Will fix.

 Specifically, section 2.1.7 should permit that punycode is not
> used at all, and section 2.1.8 should say not say that the
> string has to  be punycode-encoded.
>
> The same problem is in section 2.2 -- not all IDN's are
> punycode encoded.
>
> The way the term "punycode string" is used in section 5.1
> indicate a misunderstanding of what punycode is.  (This may
> also explain the above flaw).  Punycode is an encoding of
> unicode, comparable to, say, UTF-7.  Instead of "a Punycode
> string", I think you mean "ASCII-  encoded IDN" or similar.

That would probably also help

> I really like section 5, it makes it clear what backwards
> compatible changes we can do, and which we cannot do.  It may
> need further tweaking, but it is useful section.
>
> Concluding, while there are some useful generic discussions and
> concerns, it seems this document needs quite some work until
> it is close to becoming something that is implementable.  It's
> difficult to discuss IDNA2003 vs IDNA200x until the details
> are fleshed out.
>
> /Simon
>
> PS.  I posted this several weeks ago, but it didn't arrive in
> the archives, so it was probably filtered out.

I, at least, didn't get it.  Too bad, as most of the specific 
changes could have been dealt with in the posted -00.

Many thanks.
     john