Unicode 5.2 -> 6.0

Nicolas Williams Nicolas.Williams at oracle.com
Thu Oct 14 23:14:49 CEST 2010

On Thu, Oct 14, 2010 at 04:51:26PM -0400, Andrew Sullivan wrote:
> On Thu, Oct 14, 2010 at 04:41:18PM -0400, John C Klensin wrote:
> > extent of the tree.  The meaning of "registry" in IDNA2008 (and
> > similar terminology in IDNA2003) extends to every zone
> > administrator of a zone in the DNS.  In principle, the
> > administrator of a subdomain somewhere deep in the DNS tree
> > could have chosen to utilize one or more of these characters
> > without having to discuss that decision with anyone else.
> [...]
> If we were much further along and had any evidence at all of
> widespread use, I'd be pretty concerned.  As it is, it seems to me the
> best we can do for this particular case is ask everyone we know, and
> hope hard we get it right.  But at least we're in quite early days.
> We won't have this luxury when we go from Unicode 6.x to 7.x, I
> suppose.

And if we get it wrong?

Note: I can't find a glyph for U+19CA.  That may be dispositive.  If it
can't be rendered in any font, then it can't be used in a domainname

> I dimly recall taking minutes in one of the IDNABIS meetings in which
> I formed the impression that people thought it unlikely stuff would
> move from PVALID to DISALLOWED.  I guess my impression was wrong?  For
> if this is going to be a regular problem in future, it seems like one
> would be better to have some new class like PROBABLY-PVALID where
> characters we're not sure about live for a couple releases of the
> Unicode tables.  That feels like second-guessing Unicode, however, and
> we were trying to get out of that game.

We have to accept that Unicode will make this sort of backwards-
incompatible change made to it from time to time.  I'm not sure what we
can do about it.  I doubt that we can reliably judge probability of such
events on a per-codepoint/character basis -- if we could, then the
Unicode Consortium could too, and then they could warn us ahead of time.

So we have to deal with the question of what to do with

Knowing that this character is not in use in any TLDs' zones would be
nice: presumably it will be much easier for admins below to fix their
zones than to get existing commercial (i.e., someone paid a registrar)
registrations changed.  But we won't always have that luxury.

Now, this is what the UC has to say about this character:

"A general category change to one New Tai Lue numeric character
(U+19CA), which would have the effect of disqualifying it from inclusion
in identifiers unless grandfathering measures are in place for the
defining identifier syntax "

IOW: we're allowed to grandfather U+19CA.

The test I propose then is: if we can find a font that can render
once-PVALID-now-DISALLOWED characters, then grandfather (or consider
other factors), else don't.


More information about the Idna-update mailing list