Protocol-05

Mon Sep 29 15:53:15 CEST 2008

I had a chance to review the documents again, and here are my comments.

1. First, and most importantly, the normative definitions really have to be
moved out of the rationale document and into the protocol document. One
could argue that disentangling them is difficult for the editor, but as it
stands the documents are simply too difficult to understand in terms of the
normative implications. And if it is difficult for the editor to
disentangle, it will be far, far, more difficult for users of the
specifications to disentangle.

Concretely, I suggest that this would be done by moving the following
sections into the protocol document.

1.5.2 - 1.5.4
4
5, 5.1, 5.2
9.1

Most of the above moves into the terminology section in protocol; 9.1
(describing differences from IDNA2003) could come either near the start or
at the end.

With these changes, I think protocol would be in pretty good shape.
Rationale would need a lot more work, but if the normative sections are
moved to protocol, then it is much less of a problem for a timely release.

Other comments on protocol.

> IDNA. Note that IDNs occupying domain name slots in those older
> protocols MUST be in A-label form until and unless those protocols
> and implementations of them are upgraded.

upgraded to what? It is unclear what the referent is.

>    be Unicode).  The registry MAY permit submission of labels in A-label

>    form.  If it does so, it SHOULD perform a conversion to a U-label,

>    perform the steps and tests described below, and verify that the

>    A-label produced by the step in Section 4.5 matches the one provided

>    as input.  If, for some reason, it does not, the registration MUST be

>    rejected.

The first SHOULD has to be a MUST; otherwise the registry can register an
invalid name.

> 4.2.  Conversion to Unicode and Normalization

>

>    ... That string MUST be in Unicode Normalization

>    Form C (NFC [Unicode-UAX15]).

This constraint needs to be moved to Permitted Character and Label
Validation, for clarity, since it can otherwise be misleading (we've seen
too many people confuse these issues). I'd suggest 4.3.1 and then
renumbering the ones after. (I think this should be uncontroversial after
you look at this section, but if not let me know and I'll supply some more
reasoning)

>    As a local implementation choice, the implementation MAY choose to

>    map some forbidden characters to permitted characters (for instance

>    mapping uppercase characters to lowercase ones), displaying the

>    result to the user, and allowing processing to continue.  However, it

>    is strongly recommended that, to avoid any possible ambiguity,

>    entities responsible for zone files ("registries") accept

>    registrations only for A-labels (to be converted to U-labels by the

>    registry) or U-labels actually produced from A-labels, not forms

>    expected to be converted by some other process.

I really think this is a bad idea. One of the reasons for a clear separation
of A-Label and U-Label is to make it very clear what is being registered.
The above muddies it again. So I would drop the first sentence, and make the
second two be MUSTs, and make it clear that even if the registration process
takes a U-Label, what is being registered is the A-Label.

>    The Unicode string is checked to verify that no characters that IDNA
...
(later section)
...

>    The proposed label (in the form of a Unicode string, i.e., a putative

>    U-label) is then examined, performing tests that require examination

For clarity, the terminology needs to be a bit more consistent. I would
suggest using "putative U-Label" (or something similar) uniformly throughout
the different steps, both here and in the lookup sections.

>    there SHOULD be policies.  Even a trivial policy (e.g., "anything can

>    be registered in this zone that can be represented as an A-label -

>    U-label pair") has value because it provides notice to users and

>    applications implementers that the registry cannot be relied upon to

>    provide even minimal user-protection restrictions.  These per-

It is really an open issue as to where the best place for restrictions or
notifications of possible spoofing is in the registry or in the client
software. Given that, I think this section is best left out, or moved to
rationale and made non-normative. This is especially the case given that it
applies to registries at *all* levels, and because of the text about notice
("it provides notice to users and applications implementers") so it imposes
a SHOULD on, say, Google to provide notice to users about policies for
xxx.google.com and yyy.xxx.google.com, and so on.

>    The string produced by the above steps is checked and processed as

>    appropriate to local registry restrictions.  Application of those

>    registry restrictions may result in the rejection of some labels or

>    the application of special restrictions to others.

This paragraph is not very clear, and here would be a good point to have a
brief statement about the common techniques of "bundling" vs "blocking".
Suggested replacement:

The string produced by the above steps thus may be examined as appropriate
for consistency to local registry policies. Where characters are disallowed
according to those local policies, the string would be rejected. The string
may also be overly similar to an existing registration, such as where
Chinese string with a traditional character versus a simplified one, or a
Romanian word with S with cedilla versus S with comma below. In such cases,
there are two common techniques for local registries. With blocking, the
string is rejected. With bundling, the original registrant automatically
gets the variants of a registered name.

> ... the lookup application MAY attempt to convert it

>    to a U-label and apply the tests of Section 5.5 and, of course, the

>    conversion of Section 5.6 to that form.      If the label is converted
to

>    Unicode (i.e., to U-label form) using the Punycode decoding

>    algorithm, then the processing specified in those two sections MUST

>    be performed, and the label MUST be rejected if the resulting label

>    is not identical to the original.  See also Section 6.1.

I'm not sure how this would play out in practice. Suppose I have a program
accepts an A-Label and routes it two internal, independent modules. One
module may process the A-Label into a U-Label, while the other just send the
A-Label on to another program. This conformance clause would seem to require
that if module #1 happens to find a problem, the program as a whole MUST
reject the label. That would require that we MUST rearchitect the program to
have module 1 be able to communicate with module 2 AND that module 2 can't
send the name on without waiting on module 1. I think we have to change this
to ...MAY... SHOULD, and not ... MAY ... MUST.

>    the lookup-side tests are more permissive and rely

>    heavily on the assumption that names that are present in the DNS are

>    valid.

As far as I can tell, the only difference in the processing is that BIDI is
a SHOULD not a MUST. If that is the case, then "more permissive and rely
heavily" is overstating the differences. And either way, we really need to
have a much crisper description of the differences.

That is, paedigogically, the structure now is something like

Registry

(registry specific stuff)

Do A

Do B

Do C

Do D

Do E

...

Do H

(registry specific stuff)

Lookup

(lookup specific stuff)

Do A

Do B

Do C

Do D

Do E

...

SHOULD do H

(registry specific stuff)

That forces the user who needs to determine the differences to do his own
DIFF between them, and gives implementers of libraries who will supply
parameterized software that does both plenty of chances to screw it up. It
would be less error prone, if possible, to structure the text so as to
distinguish the core processing (A-H above), and say that Lookup follows
exactly the same process for that core except that H is optional.

>    The Unicode string MAY then be processed to prevent confounding of

>    user expectations.
...

>    Preprocessing MUST NOT map a character

>    that is valid in a label (i.e., one that is PROTOCOL-VALID or

>    permitted in any context) into another character.

These two statements must be very closely associated - the second is a
strong restriction on the first, but they are separated by gobs of text. The
second needs to be moved up to be directly after the first, eg. "Such
preprocessing MUST NOT ..."

> 6.3.  Root and other DNS Server Considerations

>

>    IDNs in A-label form will generally be somewhat longer than current

>    domain names, so the bandwidth needed by the root servers is likely

>    to go up by a small amount.  Also, queries and responses for IDNs

>    will probably be somewhat longer than typical queries historically,

>    so EDNS0 [RFC2671] support may be more important (otherwise, queries

>    and responses may be forced to go to TCP instead of UDP).

The first sentence looks like a hold-over. It is untrue, since IDNA2003 has
been deployed for some time now, and so IDNs in A-Label from are already
"current". The wording also looks like it is going to contrast "IDNs in
A-Label form" with some other IDNs (U-Label). Following is suggested
wording.

IDNs (the A-label form) are generally somewhat longer than other domain
names, so as they have increasing deployment the bandwidth needed by the
root servers is likely ...

>    This memo describes procedures for registering and looking up labels

>    that are not compatible with the preferred syntax described in the

>    base DNS specifications (STD13 [RFC1034] [RFC1035] and Host

>    Requirements [RFC1123]) because they contain non-ASCII characters.

The wording here also seems odd, given that IDN2003 has been out for some 5
years. Maybe just adding wording referencing the IDNA2003 specs at an
appropriate point?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080929/4777261a/attachment-0001.htm