Browser IDN display policy: opinions sought

Tue Dec 13 10:37:14 CET 2011

--On Tuesday, December 13, 2011 14:12 +0900 "\"Martin J.
Dürst\"" <duerst at it.aoyama.ac.jp> wrote:

> On 2011/12/13 8:03, Tina Dam wrote:
>> One more thing. Perhaps we need to treat the IDN Guidelines
>> the same way the protocol revision did - i.e. seperate
>> guidelines for registration and resolution/display? Or is
>> that re-opening the discussion that Gerv tried to avoid?

> The reason this was separated in the protocol was that
> registering as yet unassigned codepoints is total nonsense,
> whereas accepting as yet unassigned codepoints for
> resolution/display makes sense because that avoids the need
> for software updates.

Actually, that breaks down too if one is using a system (like
IDNA2008) that is dependent on Unicode properties that cannot be
known until a character is bound to the code point.  If you
don't care about properties (except insofar as they are
reflected in a highly reified table), then "don't register
strings containing unknown code points but it is ok to resolve
them" is a reasonable strategy (and the IDNA2003 strategy).
Unfortunately, defining the standard in terms of the required
table creates significant version dependencies.  By contrast, if
one gets rid of the version dependencies (modulo the presumably
infrequent need to deal with exception cases) by going to a
property model, the properties have to be known in principle at
both registration and lookup time.  That, in turn, prevents
looking up unknown code points because one cannot know if they
are valid... at least without putting a lot of trust in  the
registrars and registries who, from Gerv's point of view (and
that of many others) are Part Of The Problem.

> For the protocol, it made sense to be a bit looser on the
> receiving side. But for the security protections we are
> talking about now, and on the level of general guidelines, I
> don't see that making sense. If something "looks dangerous",
> then it shouldn't be displayed. If something "looks
> dangerous", then it shouldn't be registered.

Exactly.  And, seen in that light, what we are looking at with
Types A, B, and C are different lookup-size surrogates for
"looks dangerous".  And you and others have (IMO correctly)
pointed out, none of them is very good for that purpose.

> I may be wrong, but it looks like the main problem isn't that
> the two sides might be different. The main problem is that
> both sides would be the same, and therefore every side tries
> to blame the other, and get away with it.

I think that is right, but things are a little more complex.
Gerv's "Type B" model looks at registry policy and then, of
necessity, treats perfectly reasonable names as potentially bad
because they are registered in a TLD whose policies would allow
less reasonable names to be registered.  The language-based
approaches of Type A treat some perfectly reasonable names is
potentially bad because (I think, with some supporting evidence)
they are written in a script that isn't associated with one of
the languages the user more or less claims to read and write.
The slightly more script-based Type C treats some perfectly
reasonable names as bad because they are written in scripts that
the user hasn't certified she uses, even though the characters
of that script might be perfectly differentiable to that
particular user.  From a whitelisting perspective, there are
lots and lots of false negatives at the name/label level because
no one can really do much with labels at lookup time so they are
using these language/ registry/ script surrogates instead.

Remembering that the fact that two strings can be confused
shouldn't prohibit either from being delegated and used but
should only either prevent delegation or one of them or apply
restrictions to it ownership and/or use.  At least in general,
per label tests can be reasonably carried out only at
registration time because only the registries and registration
processes can examine exclusion lists, do (non-DNS) fuzzy match
searches over whatever is registered, usefully apply.  That may
not eliminate the need for any or all of Types A, B, and C, but
it would certainly contain a lot of the problem.

best,
    john