Disallowing code points

Sat Jul 18 17:30:11 CEST 2009

--On Friday, July 17, 2009 09:48 +1000 Chris Wright
<chris at ausregistry.com.au> wrote:

> Vint,
> 
> I fully support this approach, what I want to point out
> though, is that barring the joiner context rules, no other
> context rules are applied at lookup (and I am not saying they
> should be). Any 'registry' at any level of the DNS hierarchy
> who, either deliberately or through lack of acting diligently,
> does not apply a context rule(s) will still be manifesting the
> problem the context rule was designed to address, as clients
> will still lookup the names!
> 
> So I still don't fully understand the point of context rules,
> unless they are just going to act as a guide?

Chris,

As with any standard that isn't given the force of law
somewhere, all we can do is provide a definition of appropriate
and interoperable behavior.  One can hypothesize about
registries (zone administrations) that are evil or sloppy and
one can guarantee that there will be some out there.  One can
equally hypothesize about lookup implementations that in order
to save time, code space, because _they_ are evil, or,
conversely, because they want to provide extra protections to
users, apply tests that the standard does not require.  That is
less certain to occur, but certainly might (and Gerv's list of
characters that Mozilla has chosen to ban is indicative of the
situation).

The reason for the distinction between CONTEXTJ and CONTEXTO is
that, the last time the topic was opened, the WG concluded that
there was a difference between a character that required a
specific context but would be reasonably visible as an extra
clue if it was used inappropriately and a character that, in the
wrong context, was invisible or otherwise seriously hostile.  I
haven't analyzed how recent discussions and suggested changes
might affect that, but at least originally the distinction also
preserved a higher level of IDNA2003 compatibility -- if strings
were validly registered under IDNA2003 but required context
under IDNA2008, they would still be looked up if they were
assigned to CONTEXTO.

If you think it would be useful to add some of the above to
Rationale, please suggest text and where to put it.

Finally, while the protocol police are obviously not out there,
violating a standard in a way that is perceived to cause harm
often does have negative consequences.  We've seen browser
vendors blacklist particular characters that were permitted by
IDNA2003.  We've seen whole subtrees of the DNS treated in an
irregular way, typically by refusing to display the U-label or
other native character form, because of judgments that relevant
registries did not have acceptable policies.   And, while I'm
not aware of its happening with IDNs, there are certainly many
other areas in which violation of established standards and
protocols has been cited in court cases as evidence of grossly
negligent behavior that has led to damage to various parties.

Is it a perfect solution?  No.  But the WG did discuss the
tradeoffs --including the possibility of applying contextual
rules to ZWJ and ZWNJ only (not even to a category of which they
are, at present, the only members) and simply making all of the
CONTEXTO characters PVALID or DISALLOWED, with no guidance about
where caution was appropriate -- and concluded that the CONTEXTJ
/ CONTEXTO split was appropriate.

   john