Protocol - security issues and others

Fri Nov 28 20:05:20 CET 2008

As with my prior recent notes, comments appear belong only about
things I have not changed, or changed in a significantly
different way.  The same comments about harmlessness and lack of
comments by others apply.

--On Wednesday, 19 November, 2008 18:14 -0800 Mark Davis
<mark at macchiato.com> wrote:

> Protocol*Other than A-Label and U-Label issue already covered:*
>...

> ------------------------------
> 
> *Also: *In the definitions, we should make clear what "A-Label
> form" and "U-Label form" mean. According to what I read of the
> text, they are something like "strings that appear to be
> A-Labels (resp U-Labels), but have not been verified to be
> so". I called that "putative A-Label" or "putative U-Label" in
> my message on these definitions, because that term is used in
> some places. But we need to have a single term, use it
> consistently, and provide a definition so that the meaning is
> precise.
> 
> I would also suggest having the terms "invalid A-Label" =
> putative A-Label that is not an A-Label, and the same for
> "invalid U-Label".

While I think Mark is right and that this needs to be clarified
(see notes on Rationale), the use of "invalid A-label", or
something very close to it, appeared in early versions of
Rationale.  I was told to remove it on the grounds that, if
A-labels were required to be valid, the term was an oxymoron.  I
don't intend to get into another cycle of putting in text,
removing it, and putting it in again, so the WG will need to
reach some sort of conclusion on this subject.

> ------------------------------
> 
> As
> a local implementation choice, the implementation MAY choose
> to map some forbidden characters to permitted characters (for
> instance mapping uppercase characters to lowercase ones),
> displaying the result to the user, and allowing processing to
> continue. However, it is strongly recommended that, to avoid
> any possible ambiguity, entities responsible for zone files
> ("registries") accept registrations only for A-labels (to be
> converted to U-labels by the registry as discussed above) or
> U-labels actually produced from A-labels, not forms expected
> to be converted by some other process.
> 
> =>
> 
> As a local implementation choice, the implementation MAY
> choose to map some forbidden characters to permitted
> characters (for instance mapping uppercase characters to
> lowercase ones), displaying the result to the user, and
> allowing processing to continue. However, if this is done, the
> mapping SHOULD be in accord with the general rules for
> IDNA2003 mappings (NFKC mapping plus case folding), and the
> implementation SHOULD request the user to confirm that the
> resulting U-Label is in fact the requested string.
> 
> 
> *Rationale. *The text left a hole open for some of the very
> unpleasantness that removing the mapping was supposed to
> prevent: people thinking that they were registering one string
> when they were in fact registering a different one.

Even though I favor this change, see comments on Rationale.

> ------------------------------
> 
> 4.3.2.1. Rejection of Confusing or Hostile Sequences in
> U-labels
> 
> =>
> 
> 4.3.2.1. Rejection of Hyphen Sequences in U-labels
> 
> 
> *Rationale. *At this point in the text, the application of
> 4.3.2.1 is to -- in 3rd and 4th position. That is neither
> Confusing nor Hostile -- it is restricted because we want to
> allow for future signatures. (And what the heck is a Hostile
> Sequence: ":-("?)

Changed, but rather more because this subsection no longer
describes any sequences other than the hyphen one.

>...
> ------------------------------
> 
> Strings
> that have been produced by the steps above, and whose contents
> pass the above tests, are U-labels.
> 
> =>
> 
> Move to after 4.5.
> 
> 
> *Rationale. *Otherwise this is false, since an A-Label
> generated from a string that passed the "above" tests could be
> too long.

Yes, but that would make strings prohibited by a particular
registry invalid as U-labels.  That is unknowable.  It also
makes the last paragraph of 4.5 circular.   Fixed in a different
way.

> ------------------------------
> 
> The
> failure conditions identified in the Punycode encoding
> procedure cannot occur if the input is a U-label as determined
> by the steps above.
> 
> =>
> 
> [REMOVE]
> 
> 
> *Rationale.* It is false. The A-Label could be too long.

Actually, the Punycode encoding procedure can produce an
over-long label _and_ I was explicitly told to put something
like this in.   See above.

>...
> ------------------------------
> 
> The
> Unicode string MAY then be processed to prevent confounding of
> user expectations. For instance, it might be reasonable, at
> this step, to convert all upper case characters to lower case,
> if this makes sense in the user's environment, but even this
> should be approached with caution due to some edge cases: in
> the long term, it is probably better for users to understand
> IDNs strictly in lower- case, U-label, form. More generally,
> preprocessing may be useful to smooth the transition from
> IDNA2003, especially for direct user input, but with similar
> cautions. In general, IDNs appearing in files and those
> transmitted across the network as part of protocols are
> expected to be in either ASCII form (including A-labels) or to
> contain U-labels, rather than being in forms requiring mapping
> or other conversions. Other examples of processing for
> localization might be applied, especially to direct user
> input, at this point. They include interpreting various
> characters as separating domain name components from each
> other (label separators) because they either look like periods
> or are used to separate sentences, mapping halfwidth or
> fullwidth East Asian characters to the common form permitted in
>...
> 
> =>
> 
> The Unicode string MAY then be processed to for
> interoperability and backwards compatibility with IDNA2003.
> Such preprocessing MUST be in accordance with the mappings
> used in IDNA2003: that is, normalization with NFKC and
> case-folding.
> 
> *Rationale.* Otherwise, by allowing any and all mappings, we
> would be opening the door to massive security issues.

See discussion in note about Rationale.

> ------------------------------
> 
> o
> Labels containing other code points that are shown in the
> permitted character table as requiring a contextual rule
> ("CONTEXTO" in the tables), but for which no such rule appears
> in the table of rules. With the exception in the rule
> immediately above, applications resolving DNS names or
> carrying out equivalent operations are not required to test
> contextual rules, only to verify that a rule exists.
> 
> =>
> 
> [Delete]
> 
> 
> *Rationale.* It is pretty pointless to require checking that
> rules exist if we aren't going to require that people actually
> verify that the rules apply.

Actually quite the contrary... and I believe that the WG reached
reached about this long ago (although I could be wrong, of
course).  The point is that, if we know a character is going to
require contextual rules to be permitted, "no rule" causes it to
be treated as DISALLOWED, but does not permanently bind it to
that category.  Instead, installation of a new set of contextual
rule tables may change its status by supplying an appropriate
rule.  The whole difference between CONTEXTJ and CONTEXTO is
between the rules that are required to be checked at lookup time
and those for whom enforcement can be left to the registry, but
even that cannot be known for certain until the rules are
determined.   Note that this may require some additional text
about conditions for moving things between the two CONTEXT
categories, but that is a separate problem.

> ------------------------------
> 
> an
> attempt to look up and resolve such strings will almost
> certainly lead to a DNS lookup failure except when wildcards
> are present in the zone.
> 
> =>
> [clarify. Why are we saying "almost certainly"? What are the
> other failure cases besides wildcards?]

If the WG wants a complete explanation here, we can do it, but
my recommendation is to leave the text unchanged.  The
difficulty arises because queries are made for specific types
and "no record present for label" is different from "no record
present for label and type".

>...

Again, thanks for the careful reading.

   john