Protocol-02 and issues list
John C Klensin
klensin at jck.com
Mon Jul 14 19:56:27 CEST 2008
draft-ietf-idnabis-protocol-02 has just been queued for
posting. The balance of this note consists at a first cut at an
issues and status list and discussion based on comments received
to date. As with the list on Rationale, many of the issues are
in need of active discussion and suggestions, not just
Issues list, IDNABIS Protocol (as of 20080714)
Section numbers referenced are the same in both
draft-ietf-idnabis-protocol-01 and -02 except as noted.
There are several issues that, because of text moving back and
forth and suggestions for additional moves, are in the issues
list and summary for Rationale. They are, in general, not
Note that, unlike Rationale and some independent issues, no
specific "outstanding issues" list was posted for Protocol in
May. That was due, in part, to the underwhelming response to
the other postings. So this document incorporates what would
have been that list.
** P.1 ** Location of the Contextual Rules table
This table is the Appendices to Protocol at present; in
previous versions it was in Rationale. Based on list
discussions, it should probably be moved to Tables,
probably along with some additional materials that is
still in Rationale.
Status: Planned for the post-IETF72 versions of Protocol and
Tables unless someone can get consensus behind a different
** P.2 ** Format of the Contextual Rules table.
At present (Protocol-02), that table consists of a condensed
format that more or less matches the format used for
standard entries in Tables. With it, important fields are
separated by semocolons, potentially followed by comments
that start with "#". As an example, the first entry in the
table looks like the following in Protocol-01:
002D; HYPHEN-MINUS; F;
Must not appear at the beginning or end of a label;
# Note that a prohibition on having two hyphens as
the third and fourth characters of anything but a
valid A-label appears in the specification.
Mark Davis suggested a different format. He wrote (I have
preserved his formatting):
I suggest that the table be formatted for clarity to not
depend on whitespace -- using names for each field -- and be
broken into a list of condition/result pairs.
Code point: 200C
Name: ZERO WIDTH NON-JOINER
# Allow ZWNJ for breaking cursive connection, as needed in
Comment: There is no dependency on whitespace, at least in
Protocol-01. Mark's comments may have reflected an earlier
version. I've illustrated a version of these change with
the alternate appendix (see elsewhere), but I'm not sure
about it for two reasons. The first is whether people
prefer a more compact format or one that uses more vertical
space. We hope that the Contextual Rules registry will remain
small, but ending up with a significantly longer Tables
document (remember that this section is normative) as the
result of format may not be in our best interest. The
second point is related to the next issue.
Status: To facilitiate discussion, a variation on Mark's
layout is reflected in the second (alternate) appendix
(see P.3, below).
** P.3 ** Definition of the Contextual Rules - Regex or
Based on my understanding of what was being asked for, the
definitions of the Contextual Rules were written to use a strict
regular expression syntax with one regular expression per rule.
That syntax, illustrated by the example above for U+002D,
is not easy to look at or understand, but would lend itself
to an automatic rule interpreter. Mark's example is not
one of a single rule, or even formal use of a regular
expression. If we are going to go that route, there may be
even more simple ways to express the rules, leaving
applications implementers on their own for the
formalizations to be used in their code. A second appendix
has been supplied in -02 as the beginning of a suggestion
Please examine the two examples above, the discussion in
the text, and the forms used in both appendices and advise
on what you would like to see and why.
Status: An alternate rule form is outlined in the temporary
** P.4 ** Protocol reference to Bidi Constraints
In 22.214.171.124: the bidi constraints apply to more than just
Comment: Noted. The question of those Bidi constraints is
probably one of the larger and more substantive open issues
Status: Discussion placeholder inserted, but we need to
resolve the Bidi question before this text is adjusted to
** P.5 ** Bidi-checking Requirement on Lookup
Should this be a SHOULD or a MUST?
Comment: See the discussion anchor and text in Section 5.5
(note that this anchor has been in the document for some
time and there have been no comments on-list).
Status: Discussion anchor in text.
** P.6 ** Requirement for Policy
Mark writes about Section 4.4: "While exact policies are not
specified as part of IDNA2008 and it is expected that
different registries may specify different policies, there
SHOULD be policies." This SHOULD is pointless, unless some
constraints or guidance are given. Otherwise my policy could
be "any valid IDNA label", which would be precisely the same
as no policy at all.
Comment: I fully expect that some zone administrators will
adopt exactly that sort of policy. We've said that policy
decisions, and consistent application of those decisions, by
zone administrators are an important part of the
registration model. We have said that specifying those
policies and determining their adequacy is not an IETF
matter, but rather a matter for governments, enterprise
management, ICANN, and so on. We have seen application
implementers evaluate per-zone policies and respond with
decisions about what to display. So, I don't think this is
pointless. The problem is whether different language would
better describe the handoff.
Of course, were we to decide that our audience is purely
protocol implementers, all of this would go away.
** P.7 ** Universality of Unicode
Section 5.2 says: "The local character set, character
coding conventions, and, as necessary, display and
presentation conventions, are converted to Unicode (without
surrogates), paralleling the process described above in
Mark writes: In the vast majority of cases in modern
software, the local charset IS Unicode, so this may be
confusing. Also, UTF-16 does and must use surrogate code
units, so this needs to be more precise. And excluding
surrogate code points isn't necessary since gc=Cs are
forbidden anyway. Suggest:
"The string is converted from the local character set into
Unicode, if it is not already Unicode. The exact nature of
this conversion is beyond the scope of this document, but
may involve normalization, as described in Section 4.2."
Comment: I don't know how to evaluate "vast majority...",
but I keep running across examples and discussions that
suggest that the presumed minority is fairly large. The
most recent example is a lengthy discussion about
interoperability problems with email text body parts;
problems that would presumably be infrequent or trivial if
most systems primarily supported Unicode.
Status: The specific textual change suggested has been
made. Anyone who objects should say so.
** P.8 ** Validation of A-labels
Section 5.4 says: "In general, that conversion and testing
should be performed if the domain name will later be
presented to the user in native character form (this
requires that the lookup application be IDNA-aware)."
Mark writes: Suppose that program X creates an A-Label from
a U-Label, then sends that A-Label to program Y, which sends
it to program Z, which sends it to program W, which displays
it. It sounds like each of Y, Z, W need to validate. Is
that the intent of this text? If it is only W that needs to
validate, then it gets a bit murky in today's world, where
the boundaries between cooperating processes and programs
are very fuzzy.
Comment: That "murky" situation is exactly why the text
leaves a lot of judgment in the hands of the application.
Note that the sentence after the one you quoted now says, in
part, "others may treat the string as opaque to avoid the
additional processing at the expense of providing less
protection and information to users", which is intended to
be a very clear statement that there is a trade off
involved. I believe that, if you read the surrounding text,
you will find that the specific answer to your question is
that it is important that W should (note lower case)
validate unless it has some reason not to (note that it has
to go to most of the work to validate in order to display
and that knowing, as a consequence of system design and
knowledge of adequate checking and auditing, that it
wouldn't have received the string unless it was valid).
Validation is optional, and less likely, for Y and Z, but
their implementers may certainly do so if they are being
That text is consistent, I think, with on-list discussions
that seem to have concluded that we must be very careful
about not imposing IDNA requirements on programs that are
not IDNA-aware but that there are some serious spoofing,
abuse, and malware opportunities if programs simply assume
the validity of strings that appear to be A-labels.
Status: Suggestions for clearer text would be welcome; no
substantive change between -01 and -02.
** P.9 ** Use of "in parallel"
Section 5.5 and elsewhere use the term "in parallel" to
describe the relationship between two (or more) sets of
steps or procedures. Mark expresses concern that this will
create confusion with concurrent operations, which is not
intended. He suggests other wording.
Comment: Specific suggestions for alternate text would be
** P.10 ** Similarity of registration and resolution procedures
Mark points out that the steps in 5.5 are all the same as
in 4.3 -- except for bidi. This fact should be very clear
in the text.
Comment: Of course, they were more different until the MAYBE
categories were removed. There still seems to be merit in
describing them separately because implementers of
registration procedures and actions and implementers of
lookup ones tend to be different even if there is some
shared code. A comment could be inserted nothing the
parallelism (sic), but I'm not quite sure how or where to do
that without encouraging people to get sloppy about the
differences that do exist.
Specific suggestions and discussion would be welcome.
** P.11 ** Placeholder: Description of steps in both lookup and
These two sections still need work.
Status: Discussion anchor in text (anchor2 in -02).
Not worth doing much with until some related issues are
sorted out (e.g., whether these should be recombined, see
above). Specific suggestions welcome.
** P.12 ** Placeholder: Preprocessing
There is a lengthy placeholder in the text about
preprocessing issues. Even if the WG doesn't take on the
task of standardizing preprocessing, there is a great deal
of controversy as to whether it is necessary, should be
required, and should be standardized down to the last
mapping or if our goal is actually to change the processing
model, not just the descriptive one.
Comment: this needs to be resolved in the WG before the
text in Protocol can be made internally consistent and
consistent with Rationale. Aspects of topic have been
discussed extensively in postings in recent days, including
the Issues/ Status report for Rationale-01.
** P.13 ** Labels starting in combining marks
In Section 5.5 (lookup validation and testing), the text
contains a prohibition on labels starting with combining
marks. I think we have consensus on the prohibition. Is
the statement of it adequate?
Status: There is a discussion anchor in the text. It will
be removed if there is no discussion on this in the near
More information about the Idna-update