feedback on overview document
Erik van der Poel
erikv at google.com
Sun Dec 31 18:31:00 CET 2006
John,
Here is some feedback on your overview Internet Draft:
http://www.ietf.org/internet-drafts/draft-klensin-idnabis-issues-00.txt
2.2.3. Pre-Nameprep Validation and Character List Testing
Again in parallel to the above, the Unicode string is checked to
verify that all characters that appear in it are valid for IDNA
input. As discussed in Section 4, this check should probably be more
liberal than that of Section 2.1.4: characters that fall into
"pending", "possibly later", or "unassigned codepoint" categories in
the inclusion tables should probably not lead to label rejection at
this point. Instead, the resolver should (MUST?) rely on the
presence or absence of labels containing such characters in the DNS
to determine their validity.
What about characters like the slash-lookalike? Are user agents permitted
to reject (or specially treat) labels containing such characters before
lookup?
RFC 3987 (IRI) says that some schemes allow the use of IDNs. RFC 2616
(HTTP) was published before IDNA2003 and it does not mention IDNs.
Should we make sure that the new HTTP work includes an action item to
include IDNs explicitly?
Is there a rough consensus regarding the removal of NFKC mappings from
IDNA, given that some URIs in existing HTML documents include such
characters? Maybe some of the NFKC mappings should be preserved
in IDNA200x?
What does "label rejection" mean? Should we mention the display of some
labels in Punycode form?
There is now general
consensus that this exclusion-based model was a mistake and should be
replaced, in IDNA200x, by a system that lists only those characters
that are permitted and does much less mapping.
It might be a good idea to include a rationale for this.
----------------------------------------
The rest of this email is less controversial stuff:
This work is being discussed on the mailing list
idn-update at alvestrand.no
Should be idna-update at alvestrand.no
Might also be good to include the URL for subscription:
http://www.alvestrand.no/mailman/listinfo/idna-update
only ASCII letters, digits, and hyphens
hyphens -> hyphen
The Unicode string is examined to prohibit characters that IDNA does
not permit in input.
Given that some of our email discussions have mentioned the input and
output of IDNA, I wonder if the above wording might suggest that there
is an inclusion list for input and another one for output. Would it be
better to remove the words "in input" above?
For IDNA200x, the new Stable
NFKC method is used as a base to facilitate migration to future
versions of Unicode
Ken already mentioned this, but we need a reference for the Stable NFKC.
2.1.4.2. Case-folding
The normalized string is then case-mapped for scripts that make case
distinctions similar to those of Greek to permit approximating the
ASCII-case matching applied on name resolution in the DNS. Strictly
speaking, case-folding starts with the normalization process above,
then strings are case-folded, then they are normalized again. The
application of the "FC_NFKC_Closure" property above simplifies this
process in practice.
RFC 3454 does case-folding before normalization and the case-folding
includes special provisions for the following NFKC step (if used).
Should this part of IDNA200x explicitly mention that the order of the
steps may appear to be different, but has been more carefully
specified? (If that is the case.)
2.1.5. Post-Stringprep Character String Checking and Processing
All characters output from the step above are then verified for the
permissibility for IDNA, i.e., presence in the table of included
characters (see Section 4).
Stringprep actually includes a prohibition step after normalization,
so it may be a little confusing to say "Post-Stringprep"? Or is this
IDNA step separate from Stringprep?
o This document, containing an overview and rationale.
Is this Internet Draft intended to eventually become an RFC? If so,
will it still be "Proposed Issues..." or will it become the new
version of IDNA (i.e. RFC3490bis)?
o A document describing the "BIDI problem" with Stringprep and
proposing a solution [IDNA200X-BIDI].
o A list of initially permitted code points, based on Unicode 5.0
code blocks. See Section 4.
Will these 2 become Stringprepbis?
For example, it
may be desirable to more strongly distinguish between use of the
protocols for "registration" (i.e., entering names in the DNS) and
"lookup" (queries to the DNS), with most character inclusion rules
applied at registration time only and clients generating queries
relying on the lookup process to return "not found" errors if
characters were invalid.
Again, what about the slash-lookalikes in deep labels? (Maybe the
word "most" is referring to this issue...)
A prefix change would clearly be
necessary if the algorithms were modified in a manner that would
create serious ambiguities during subsquent transition in
registrations.
subsquent -> subsequent
2. An input string that is valid under IDNA2003 and also valid under
IDNA200x yields two different Punycode strings with the different
versions .
Extra space before the period (.)
2. Adjustments in Stringprep tables or IDNA actions, including
normalization definitions, that do not impact characters that
have already been invalid under IDNA2003.
do not impact -> impact?
The section above (Section 5)
essentially applies in this context as well: the proposed new
inclusion tables [IDNA200X-Blocks], the reduction in the number of
characters permitted as input to Stringprep Section 4
Section 4 -> in Section 4?
for which an "ae" cannot be substituted acording to current
acording -> according
In addition, there were may specific
may -> many
Erik
More information about the Idna-update
mailing list