Rationale problems

Mark Davis mark at macchiato.com
Thu Nov 20 22:12:50 CET 2008


------------------------------

to reduce the opportunities for attacks via the encoding system.

=>[Reword]

*Rationale.* I don't know what "the encoding system" means here, so I can't
even suggest any replacement text.


------------------------------

The
information given in this section is provided to make the rules,
tables, and protocol easier to understand. It is not normative.

=>

This document is not normative.  The information given in this document
is provided to make IDNA2008 easier to understand, and provide background
information for why changes were made from IDNA2003.

[AND move to the end of Section 1.1]

*Rationale.* Make it clear that not just that section is not normative.

------------------------------

Characters
that are placed in the "PROTOCOL-VALID" category are never removed
from it unless the code points themselves are removed from Unicode
(such removal would be inconsistent with the Unicode stability
principles (see [Unicode51], Appendix F) and hence should never
occur).

=>

Characters
that are placed in the "PROTOCOL-VALID" category are expected to never
be removed from it or reclassified.
While theoretically characters could be removed from Unicode,
such removal would be inconsistent with the Unicode stability
principles (see [Unicode51], Appendix F) and hence should never occur.

*
Rationale. *Use the same language for PROTOCOL-VALID as for DISALLOWED
("expected to never"), since "never" alone is a promise that can't be kept.
Reword the Unicode issue for clarity.
------------------------------

Only the former are fully tested at lookup time.

=>

Only the former require full testing at lookup time.


*Rationale. *Wording as spec (see other note).

------------------------------

Some
characters are sufficiently problematic for use in IDNs that they
should be excluded for both registration and lookup (i.e., IDNA-
conforming applications performing name lookup should verify that
these
characters are absent; if they are present, the label strings should
be rejected rather than converted to A-labels and looked up.

=>

Some characters are inappropriate for use in IDNs and are thus
excluded for both registration and lookup (i.e., IDNA- conforming
applications performing name lookup should verify that these
characters are absent; if they are present, the label strings should
be rejected rather than converted to A-labels and looked up.
Some of these characters are problematic for use in IDNs (such as the
FRACTION SLASH character), while some of them (such as the HEART symbol)
simply fall outside the conventions for typical identifiers (basically
letters and numbers).


*Rationale. *For only an miniscule fraction of the characters that were in
unmapped in IDNA2003 and illegal in IDNA2008 is there any evidence of being
"problematic". Also the "should be" language is more appropriate for a
proposal, not describing the current spec. The above wording makes clear the
main reason for this break in compatibility, while noting the problematic
nature of some characters. Also supplies some concrete examples (we could
use more of that!).

Note that this is in line with text further down, where it does have a
breakdown into two categories "some
because they are actively dangerous in URI, IRI, or similar contexts
and others because there is no evidence that they are important enough
to..."

------------------------------

If
a character is classified as "DISALLOWED" in error and the error is
sufficiently problematic, the only recourse would be either to
introduce a new code point into Unicode and classify it as
"PROTOCOL-VALID" or for the IETF to accept the considerable costs of
an incompatible change and replace the relevant RFC with one
containing appropriate exceptions.

=>

If a character is classified as "DISALLOWED" when it should not have
been, there are two possible recourses:
(a)
Replace the relevant RFC with one containing appropriate exceptions, accepting
a change that many people feel to have considerable costs.
(b) Propose a new character in Unicode that would be identical (except for
its behavior in IDNA), which would be extremely unlikely to be accepted,
since it would violate Unicode and ISO policies on duplicate encoding.


*Rationale.* Reflect reality. There is no particular consensus that
changing DISALLOWED to PVALID would in fact have such costs. However,
because some people do feel that is to be the case, we can certainly reflect
that opinion here. We also don't want to hold out false hope that
Unicode/ISO would do a duplicate encoding just for the purpose of IDNA.


------------------------------

For
example, it is generally believed that labels containing characters
from more than one script are a bad practice although there may be
some important exceptions to that principle.

=>

For example, labels containing characters from more than one script are
problematic where those characters can cause problems of visual confusion,
such as using a Cyrillic character for "a" -- which looks exactly like a
Latin "a" -- in the midst of an otherwise Latin label. In other cases,
mixing scripts may be perfectly acceptable, such as using Latin letters in
the midst of Chinese characters.

*Rationale. *Use concrete examples of problems.

------------------------------

Other
issues in domain name identification and processing arise because
IDNA2003 specified that several other characters be treated
...
[[anchor16:
Above text is a substitute for an earlier (pre -01) version and is
hoped to be more clear. Comments and improvements welcome.]]

=>

[Remove]


Rationale. Unless at least one example of a concrete problem can be
provided, this needs to be removed. What is wrong with changing the
Regex for recognizing a URL from using [.] as the label delimiter to
using [.。。﹒.]  (that is, [\x{002E}\x{FF0E}\x{FE52} \x{3002}\x{FF61}])?


------------------------------

Highly Localized Preprocessing.

=>
[Remove this section and reword neighboring text. ]


*Rationale.* Major security issue (see notes on protocol).


------------------------------

Anyone looking up a label in a DNS zone is required to
...
o Avoid validating other contextual rules about characters, including
mixed-script label prohibitions, although such rules may be used to
influence presentation decisions in the user interface.

=>
[Remove last clause. Add editorial note that this needs to be reviewed
against final text in protocol.]


*Rationale. *The protocol does not, and should not, *require* someone
not to validate that a purported U-Label is actually a U-Label!!
Secondly, this and any other place in the document that reiterates
what protocol requires needs to be marked to be verified before
publication for accuracy, so that items like this don't mistakenly get
through.


------------------------------

characters are permanently excluded

=>

characters are excluded

*Rationale. *
In accordance with "expected" language elsewhere. We need only say "excluded".

------------------------------

For example, an essential element of the ASCII case- mapping functions
is that uppercase(character) must be equal to
uppercase(lowercase(character)). That requirement may not be satisfied
with IDNs. For example, there are some characters in scripts that use
case distinction that do not have counterparts in one case or the
other.

=>
[Delete]

*Rationale. *While the roundtripping under case operations of ASCII of
characters is a feature, it is not an "essential" feature of ASCII.
Moreover, even in ASCII, strings do not roundtrip: "McGowan" doesn't
roundtrip, for example. And neither of these points are relevant to the
argument at hand, they just weaken it.

------------------------------

For
example, putative labels that contain unassigned code points will now
be rejected, while IDNA2003 permitted them (something that is now
recognized as a considerable source of risk)

=>

For
example, putative labels that contain unassigned code points will now
be rejected, while IDNA2003 permitted them (something
many feel to be a considerable source of risk)


*Rationale.*
 It isn't so recognized. There is no example of any case where it is a
risk, since a label containing unassigned characters was always
rejected in registration in IDNA2003, and therefore couldn't be
matched. Note that IDNA2008 also does not require lookup to completely
verify that putative U-Labels are actual U-Labels.



Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20081120/beb9b347/attachment-0001.htm 


More information about the Idna-update mailing list