Protocol-03 and updated issues list
John C Klensin
klensin at jck.com
Mon Jul 28 04:33:43 CEST 2008
Hi.
I'm attaching the updated copy of the Protocol issues list,
reflecting my response to Marcos (posted earlier) and some
additional notes. The doc itself is now in the posting queue;
given the cutoff and some ambiguity (in my mind, probably not
the Secretariat's) about when it ends, I don't know when it will
go up.
I had hoped to get the parallel material on Rationale out at
much the same time, but am holding it until I can get into a bit
better shape with the text discussed in the note after next (a
response to Mark on "Interoperability").
john
-------------- next part --------------
Issues list, IDNABIS Protocol (as of 20080727)
Section numbers referenced are the same in both
draft-ietf-idnabis-protocol-01 and -02 except as noted.
There are several issues that, because of text moving back and
forth and suggestions for additional moves, are in the issues
list and summary for Rationale. They are, in general, not
repeated here.
Note that, unlike Rationale and some independent issues, no
specific "outstanding issues" list was posted for Protocol in
May. That was due, in part, to the underwhelming response to
the other postings. So this document incorporates what would
have been that list.
** P.1 ** Location of the Contextual Rules table
This table is the Appendices to Protocol at present; in
previous versions it was in Rationale. Based on list
discussions, it should probably be moved to Tables,
probably along with some additional material that is
still in Rationale.
Status: Planned for the post-IETF72 versions of Protocol and
Tables unless someone can get consensus behind a different
model.
** P.2 ** Format of the Contextual Rules table.
At present (Protocol-02), that table consists of a condensed
format that more or less matches the format used for
standard entries in Tables. With it, important fields are
separated by semocolons, potentially followed by comments
that start with "#". As an example, the first entry in the
table looks like the following in Protocol-01:
002D; HYPHEN-MINUS; F;
Must not appear at the beginning or end of a label;
Regular expression:
[^^]\u002D|\u00SD[^$] ;
# Note that a prohibition on having two hyphens as
the third and fourth characters of anything but a
valid A-label appears in the specification.
Mark Davis suggested a different format. He wrote (I have not
preserved his formatting):
I suggest that the table be formatted for clarity to not
depend on whitespace -- using names for each field -- and be
broken into a list of condition/result pairs.
Code point: 200C
Name: ZERO WIDTH NON-JOINER
Lookup: True
# Allow ZWNJ for breaking cursive connection, as needed in
Farsi.
Before: [[:Joining_Type=Dual_Joining:]
[:Joining_Type=Left_Joining:]]
[:Joining_Type=Transparent:]*
After: [:Joining_Type=Transparent:]*
[[:Joining_Type=Dual_Joining:]
[:Joining_Type=Right_Joining:]]
Value: PVALID
Comment: There is no dependency on whitespace, at least in
Protocol-01. Mark's comments may have reflected an earlier
version. I've illustrated a version of these change with
the alternate appendix (see elsewhere), but I'm not sure
about it for two reasons. The first is whether people
prefer a more compact format or one that uses more vertical white
space. We hope that the Contextual Rules registry will remain
small, but ending up with a significantly longer Tables
document (remember that this section is normative) as the
result of format may not be in our best interest. The
second point is related to the next issue.
Status: To facilitiate discussion, a variation on Mark's
layout is reflected in the second (alternate) appendix
(see P.3, below).
** P.3 ** Definition of the Contextual Rules - Regex or otherwise
Based on my understanding of what was being asked for, the formal
definitions of the Contextual Rules were written to use a strict
regular expression syntax with one regular expression per rule.
That syntax, illustrated by the example above for U+002D,
is not easy to look at or understand, but would lend itself
to an automatic rule interpreter. Mark's example is not
one of a single rule, or even formal use of a regular
expression. If we are going to go that route, there may be
even more simple ways to express the rules, leaving
applications implementers on their own for the
formalizations to be used in their code. A second appendix
has been supplied in -02 as the beginning of a suggestion
for discussion.
Please examine the two examples above, the discussion in
the text, and the forms used in both appendices and advise
on what you would like to see and why.
Note that they are provided in these versions of the
document for comparison purposes only. Assuming we can
agree on which one we want, only one will survive into the
post-IETF versions of the documents.
Status: An alternate rule form is outlined in the temporary
Appendix 2.
** P.4 ** Protocol reference to Bidi Constraints
In 4.3.2.4: the bidi constraints apply to more than just
single labels.
Comment: Noted. The question of those Bidi constraints is
probably one of the larger and more substantive open issues
we face.
Status: Discussion placeholder inserted, but we need to
resolve the Bidi question before this text is adjusted to
match.
** P.5 ** Bidi-checking Requirement on Lookup
Should this be a SHOULD or a MUST?
Comment: See the discussion anchor and text in Section 5.5
(note that this anchor has been in the document for some
time and there have been no comments on-list).
Status: Discussion anchor in text.
** P.6 ** Requirement for Policy
Mark writes about Section 4.4: "While exact policies are not
specified as part of IDNA2008 and it is expected that
different registries may specify different policies, there
SHOULD be policies." This SHOULD is pointless, unless some
constraints or guidance are given. Otherwise my policy could
be "any valid IDNA label", which would be precisely the same
as no policy at all.
Comment: I fully expect that some zone administrators will
adopt exactly that sort of policy. We've said that policy
decisions, and consistent application of those decisions, by
zone administrators are an important part of the
registration model. We have said that specifying those
policies and determining their adequacy is not an IETF
matter, but rather a matter for governments, enterprise
management, ICANN, and so on. We have seen application
implementers evaluate per-zone policies and respond with
decisions about what to display. So, I don't think this is
pointless. The problem is whether different language would
better describe the handoff.
Of course, were we to decide that our audience is purely
protocol implementers, all of this would go away.
** P.7 ** Universality of Unicode
Section 5.2 says: "The local character set, character
coding conventions, and, as necessary, display and
presentation conventions, are converted to Unicode (without
surrogates), paralleling the process described above in
Section 4.2."
Mark writes: In the vast majority of cases in modern
software, the local charset IS Unicode, so this may be
confusing. Also, UTF-16 does and must use surrogate code
units, so this needs to be more precise. And excluding
surrogate code points isn't necessary since gc=Cs are
forbidden anyway. Suggest:
"The string is converted from the local character set into
Unicode, if it is not already Unicode. The exact nature of
this conversion is beyond the scope of this document, but
may involve normalization, as described in Section 4.2."
Comment: I don't know how to evaluate "vast majority...",
but I keep running across examples and discussions that
suggest that the presumed minority is fairly large. The
most recent example is a lengthy discussion about
interoperability problems with email text body parts;
problems that would presumably be infrequent or trivial if
most systems primarily supported Unicode.
Status: The specific textual change suggested has been
made. Anyone who objects should say so.
** P.8 ** Validation of A-labels
Section 5.4 says: "In general, that conversion and testing
should be performed if the domain name will later be
presented to the user in native character form (this
requires that the lookup application be IDNA-aware)."
Mark writes: Suppose that program X creates an A-Label from
a U-Label, then sends that A-Label to program Y, which sends
it to program Z, which sends it to program W, which displays
it. It sounds like each of Y, Z, W need to validate. Is
that the intent of this text? If it is only W that needs to
validate, then it gets a bit murky in today's world, where
the boundaries between cooperating processes and programs
are very fuzzy.
Comment: That "murky" situation is exactly why the text
leaves a lot of judgment in the hands of the application.
Note that the sentence after the one you quoted now says, in
part, "others may treat the string as opaque to avoid the
additional processing at the expense of providing less
protection and information to users", which is intended to
be a very clear statement that there is a trade off
involved. I believe that, if you read the surrounding text,
you will find that the specific answer to your question is
that it is important that W should (note lower case)
validate unless it has some reason not to (note that it has
to go to most of the work to validate in order to display
and that knowing, as a consequence of system design and
knowledge of adequate checking and auditing, that it
wouldn't have received the string unless it was valid).
Validation is optional, and less likely, for Y and Z, but
their implementers may certainly do so if they are being
cautious.
That text is consistent, I think, with on-list discussions
that seem to have concluded that we must be very careful
about not imposing IDNA requirements on programs that are
not IDNA-aware but that there are some serious spoofing,
abuse, and malware opportunities if programs simply assume
the validity of strings that appear to be A-labels.
Status: Suggestions for clearer text would be welcome; no
substantive change between -01 and -02.
** P.9 ** Use of "in parallel"
Section 5.5 and elsewhere use the term "in parallel" to
describe the relationship between two (or more) sets of
steps or procedures. Mark expresses concern that this will
create confusion with concurrent operations, which is not
intended. He suggests other wording.
Comment: Specific suggestions for alternate text would be
wlecome.
** P.10 ** Similarity of registration and resolution procedures
Mark points out that the steps in 5.5 are all the same as
in 4.3 -- except for bidi. This fact should be very clear
in the text.
See also P.15
Comment: Of course, they were more different until the MAYBE
categories were removed. There still seems to be merit in
describing them separately because implementers of
registration procedures and actions and implementers of
lookup ones tend to be different even if there is some
shared code. A comment could be inserted nothing the
parallelism (sic), but I'm not quite sure how or where to do
that without encouraging people to get sloppy about the
differences that do exist.
Specific suggestions and discussion would be welcome.
** P.11 ** Placeholder: Description of steps in both lookup and
registration
These two sections still need work.
Status: Discussion anchor in text (anchor2 in -02).
Not worth doing much with until some related issues are
sorted out (e.g., whether these should be recombined, see
above). Specific suggestions welcome.
** P.12 ** Placeholder: Preprocessing
There is a lengthy placeholder in the text about
preprocessing issues. Even if the WG doesn't take on the
task of standardizing preprocessing, there is a great deal
of controversy as to whether it is necessary, should be
required, and should be standardized down to the last
mapping or if our goal is actually to change the processing
model, not just the descriptive one.
Comment: this needs to be resolved in the WG before the
text in Protocol can be made internally consistent and
consistent with Rationale. Aspects of topic have been
discussed extensively in postings in recent days, including
the Issues/ Status report for Rationale-01.
** P.13 ** Labels starting in combining marks
In Section 5.5 (lookup validation and testing), the text
contains a prohibition on labels starting with combining
marks. I think we have consensus on the prohibition. Is
the statement of it adequate?
Status: There is a discussion anchor in the text. It will
be removed if there is no discussion on this in the near
future.
** P.14 ** (Editorial) Text about A-labels on registration
Should this text be moved from 4.1 to 4.3?
Status: See placeholder in draft.
** P.15 ** Reducing duplication in "registration" and "lookup"
The description of steps in Sections 4 and 5 are very
similar. There are suggestions that the two be recombined
(see Mark's notes from some time ago) and another one that
we create a new section with the common material and point
to it from the two existing (but shortened) sections (see
Marcos's notes on Protocol).
Comment: Keeping the two sections separate contributes,
IMO, to the goal of making it easier for people to find the
information they need to implement things correctly, so I'm
nervous about recombining the sections. Marcos's
suggestion seems like a middle ground, although it will
result in more page-flipping.
Status: We need some discussion and at least rough
consensus on this if it is going to be changed. Otherwise,
the default is more or less status quo.
** P.16 ** Versions and the Conceptual Rules
Marcos suggested that the Conceptual Rules Registry and the
Derived Properties one have a formal version structure. I
may or may not understand the suggestion the way he
intends, but, if I do, this is an Issue.
Comment: The IETF has had poor success with version numbers
in tables and the like, especially those that are intended
for future compatibility, with the notorious "MIME-Version:
1.0" header standing out as an example. So I'm reluctant
to this unless we have a clear understanding of how we (and
implementations) would use those versions, what error
conditions we would expect and to whom they would be
reported, etc. Much more discussion and/or text needed.
More information about the Idna-update
mailing list