IDNA 2008 security

Dick Sites dsites at google.com
Tue Dec 2 01:21:35 CET 2008


The http://tools.ietf.org/html/draft-ietf-idnabis-tables-04 draft
looks good to me, with a few exceptions. Please pass these comments on
as appropriate.

First, I very much support the inclusion-based approach. It should be
clearly stated on page 4 that Unassigned codes are Disallowed until
explicitly assigned and included.

It would be helpful to state explicitly on page 4 or 5 that upper-case
letters are disallowed. For emphasis and to avoid confusion of the
casual reader, Lu could be removed from LetterDigits on page 5.

On page 9, rule 11 should be restated as The value is DISALLOWED. This
avoids the need for a rule 12 that specifies unconditional Disallowed,
and it makes this section exactly match the recasting in Appendix A.1

Appendix A is unclear in several places. Before(cp) is not properly
defined for cp the first character of a label. After(cp) is not
defined at all; when added, it should address the case that cp is the
last character of a label.

The current definition of FirstChar appears flawed. Does cp .eq.
FirstChar return true if cp is the third character but is the
identical code point as the first character? Same comment for
LastChar.

The meaning of Lookup: true or false escapes me.

The expression in A.2 with constant U+002D should be redone in the
style of A.11 with the variable cp.

A.3 is unclear on what is intended with Script(cp) .eq. Arabic when cp
is U+200C. On its face, the first term appears to be always false,
since Script(cp) for cp = U+200C is Inherited. I suspect there is a
missing For all Characters or somesuch. The Script(before(cp)) part is
undefined if cp is the first character, but no term states the
constraint that cp must not be the first character.

It would be helpful to combine identical rules A.6 and A.7 and to
combine A.9 with A.10 and A.11 with A.12.

There is nothing in this draft that addresses known real-world
phishing exploits and disallows them. That seems like a truly
unfortunate oversight. Specifically, "paypal" spelled with one or more
Cyrillic lookalike-a characters is allowed. Yet all the mechanism is
in place to require U+0430, etc. only to be used in a Cyrillic script
label.

Even better would be an inclusion-based  approach that only allows
change of script at a hyphen. Legitimate domain owners could then
prevent an entire class of phishing by not using hyphen in their
actual labels, while domain owners who want foo-бар or Фу-bar can do
so. Hyphen would be enough of a clue for some users in that something
unusual might be going on, and would allow only
  p-а-yp-а-l
for use of two Cyrillic letters intermixed with Latin letters. This
enforced simple rule could perhaps replace several of the current
more-specialized context rules.

The oversight suggests that this draft is just a collection of
rules and not a serious effort to improve security on the web.

/dick


More information about the Idna-update mailing list