I-D Action:draft-faltstrom-5892bis-01.txt

Tue Dec 14 04:15:55 CET 2010

--On Tuesday, December 14, 2010 10:51 +1100 James Mitchell
<james.mitchell at ausregistry.com.au> wrote:

> It is not clear to me whether consideration has been made
> regarding contextual rules for characters introduced in
> Unicode 6.0. Classifying such characters as disallowed may
> have adverse consequences down the track. It was my
> understanding that such characters would be classified as
> CONTEXTO without having a rule, and were thus invalid for
> registration until a rule was proposed.
> 
> I think it would be valuable to add a paragraph to the draft
> to clarify that we have not considered contextually valid
> characters, should this be the case.

James,

(personal opinion, without having given this a lot of thought
lately)

Yes, but...

The provision in the IDNA2008 specs that you are referring to
is, like the backward-compatibility one to which the draft is
addressed, a provision that I think we hope to never use.  That
doesn't make the provisions less important, but it does put a
discussion about using them in a different light.  Indeed, some
of us were influenced in our conclusion that nothing need be
done in the 6.0 case (as reflected in this version of the draft)
by a sort of "well, we may need to use that mechanism sometime,
but this situation doesn't justify opening the gates".  I have
to believe that we would do it if the situation was significant
enough, but it just doesn't appear to be... and there are costs
associated with divergence from the property set for a given
Unicode version that should not be underestimated.

The contextual machinery is like that, only more so.  We know
that we would need to expand CONTEXTJ if new invisible
("zero-width") joiners are added to Unicode.  We've been told
the odds of that happening are vanishingly small and, indeed,
there do not appear to be any new joiners in Unicode 6.0.  It
was designed to deal with characters which were required from
the point of view of people trying to construct DNS labels as
mnemonics based on a particular languages, but which caused
special problems --problems that would normally exclude them
(either by Unicode properties and other rules or by exception).
I don't expect to see many such characters introduced in the
future.   A superficial review of the characters added in 6.0
didn't cause any examples to leap out at me.   It is also worth
noting that, to get a character onto the CONTEXTO list, someone
speaking for a script community is going to need to stand up and
say "this character is needed _and_ it is sufficiently
problematic that CONTEXTO treatment is necessary" (otherwise, it
would be sufficient to simply add it as PVALID to that exception
list if the properties would otherwise identify it as
DISALLOWED).  To the extent to which most of the characters
being added to Unicode are associated with scripts that are
either no longer in use or very obscure, the odds of someone
standing up and making that representation are quite low.   

So, while I won't claim I have done a complete analysis (I have
looked at the complete list of new 6.0 characters), I think we
are pretty safe as far as new contextual characters are
concerned.  And I have looked in enough detail, and think others
have too, to get us past the threshold of "not considered".

regards,
   john