feedback on overview document

Sun Dec 31 18:31:00 CET 2006

John,

Here is some feedback on your overview Internet Draft:

http://www.ietf.org/internet-drafts/draft-klensin-idnabis-issues-00.txt

2.2.3.  Pre-Nameprep Validation and Character List Testing

   Again in parallel to the above, the Unicode string is checked to
   verify that all characters that appear in it are valid for IDNA
   input.  As discussed in Section 4, this check should probably be more
   liberal than that of Section 2.1.4: characters that fall into
   "pending", "possibly later", or "unassigned codepoint" categories in
   the inclusion tables should probably not lead to label rejection at
   this point.  Instead, the resolver should (MUST?) rely on the
   presence or absence of labels containing such characters in the DNS
   to determine their validity.

What about characters like the slash-lookalike? Are user agents permitted
to reject (or specially treat) labels containing such characters before
lookup?

RFC 3987 (IRI) says that some schemes allow the use of IDNs. RFC 2616
(HTTP) was published before IDNA2003 and it does not mention IDNs.
Should we make sure that the new HTTP work includes an action item to
include IDNs explicitly?

Is there a rough consensus regarding the removal of NFKC mappings from
IDNA, given that some URIs in existing HTML documents include such
characters? Maybe some of the NFKC mappings should be preserved
in IDNA200x?

What does "label rejection" mean? Should we mention the display of some
labels in Punycode form?

   There is now general
   consensus that this exclusion-based model was a mistake and should be
   replaced, in IDNA200x, by a system that lists only those characters
   that are permitted and does much less mapping.

It might be a good idea to include a rationale for this.

----------------------------------------

The rest of this email is less controversial stuff:

   This work is being discussed on the mailing list
   idn-update at alvestrand.no

Should be idna-update at alvestrand.no

Might also be good to include the URL for subscription:

http://www.alvestrand.no/mailman/listinfo/idna-update

   only ASCII letters, digits, and hyphens

hyphens -> hyphen

   The Unicode string is examined to prohibit characters that IDNA does
   not permit in input.

Given that some of our email discussions have mentioned the input and
output of IDNA, I wonder if the above wording might suggest that there
is an inclusion list for input and another one for output. Would it be
better to remove the words "in input" above?

   For IDNA200x, the new Stable
   NFKC method is used as a base to facilitate migration to future
   versions of Unicode

Ken already mentioned this, but we need a reference for the Stable NFKC.

2.1.4.2.  Case-folding

   The normalized string is then case-mapped for scripts that make case
   distinctions similar to those of Greek to permit approximating the
   ASCII-case matching applied on name resolution in the DNS.  Strictly
   speaking, case-folding starts with the normalization process above,
   then strings are case-folded, then they are normalized again.  The
   application of the "FC_NFKC_Closure" property above simplifies this
   process in practice.

RFC 3454 does case-folding before normalization and the case-folding
includes special provisions for the following NFKC step (if used).
Should this part of IDNA200x explicitly mention that the order of the
steps may appear to be different, but has been more carefully
specified? (If that is the case.)

2.1.5.  Post-Stringprep Character String Checking and Processing

   All characters output from the step above are then verified for the
   permissibility for IDNA, i.e., presence in the table of included
   characters (see Section 4).

Stringprep actually includes a prohibition step after normalization,
so it may be a little confusing to say "Post-Stringprep"? Or is this
IDNA step separate from Stringprep?

   o  This document, containing an overview and rationale.

Is this Internet Draft intended to eventually become an RFC? If so,
will it still be "Proposed Issues..." or will it become the new
version of IDNA (i.e. RFC3490bis)?

   o  A document describing the "BIDI problem" with Stringprep and
      proposing a solution [IDNA200X-BIDI].

   o  A list of initially permitted code points, based on Unicode 5.0
      code blocks.  See Section 4.

Will these 2 become Stringprepbis?

   For example, it
   may be desirable to more strongly distinguish between use of the
   protocols for "registration" (i.e., entering names in the DNS) and
   "lookup" (queries to the DNS), with most character inclusion rules
   applied at registration time only and clients generating queries
   relying on the lookup process to return "not found" errors if
   characters were invalid.

Again, what about the slash-lookalikes in deep labels? (Maybe the
word "most" is referring to this issue...)

   A prefix change would clearly be
   necessary if the algorithms were modified in a manner that would
   create serious ambiguities during subsquent transition in
   registrations.

subsquent -> subsequent

   2.  An input string that is valid under IDNA2003 and also valid under
       IDNA200x yields two different Punycode strings with the different
       versions .

Extra space before the period (.)

   2.  Adjustments in Stringprep tables or IDNA actions, including
       normalization definitions, that do not impact characters that
       have already been invalid under IDNA2003.

do not impact -> impact?

   The section above (Section 5)
   essentially applies in this context as well: the proposed new
   inclusion tables [IDNA200X-Blocks], the reduction in the number of
   characters permitted as input to Stringprep Section 4

Section 4 -> in Section 4?

   for which an "ae" cannot be substituted acording to current

acording -> according

   In addition, there were may specific

may -> many

Erik