Unicode 5.2 -> 6.0

Nicolas Williams Nicolas.Williams at oracle.com
Thu Oct 14 23:14:49 CEST 2010


On Thu, Oct 14, 2010 at 04:51:26PM -0400, Andrew Sullivan wrote:
> On Thu, Oct 14, 2010 at 04:41:18PM -0400, John C Klensin wrote:
> > extent of the tree.  The meaning of "registry" in IDNA2008 (and
> > similar terminology in IDNA2003) extends to every zone
> > administrator of a zone in the DNS.  In principle, the
> > administrator of a subdomain somewhere deep in the DNS tree
> > could have chosen to utilize one or more of these characters
> > without having to discuss that decision with anyone else.
> 
> [...]
> 
> If we were much further along and had any evidence at all of
> widespread use, I'd be pretty concerned.  As it is, it seems to me the
> best we can do for this particular case is ask everyone we know, and
> hope hard we get it right.  But at least we're in quite early days.
> We won't have this luxury when we go from Unicode 6.x to 7.x, I
> suppose.

And if we get it wrong?

Note: I can't find a glyph for U+19CA.  That may be dispositive.  If it
can't be rendered in any font, then it can't be used in a domainname
label.

> I dimly recall taking minutes in one of the IDNABIS meetings in which
> I formed the impression that people thought it unlikely stuff would
> move from PVALID to DISALLOWED.  I guess my impression was wrong?  For
> if this is going to be a regular problem in future, it seems like one
> would be better to have some new class like PROBABLY-PVALID where
> characters we're not sure about live for a couple releases of the
> Unicode tables.  That feels like second-guessing Unicode, however, and
> we were trying to get out of that game.

We have to accept that Unicode will make this sort of backwards-
incompatible change made to it from time to time.  I'm not sure what we
can do about it.  I doubt that we can reliably judge probability of such
events on a per-codepoint/character basis -- if we could, then the
Unicode Consortium could too, and then they could warn us ahead of time.

So we have to deal with the question of what to do with
PVALID->DISALLOWED transitions.

Knowing that this character is not in use in any TLDs' zones would be
nice: presumably it will be much easier for admins below to fix their
zones than to get existing commercial (i.e., someone paid a registrar)
registrations changed.  But we won't always have that luxury.

Now, this is what the UC has to say about this character:

"A general category change to one New Tai Lue numeric character
(U+19CA), which would have the effect of disqualifying it from inclusion
in identifiers unless grandfathering measures are in place for the
defining identifier syntax "

IOW: we're allowed to grandfather U+19CA.

The test I propose then is: if we can find a font that can render
once-PVALID-now-DISALLOWED characters, then grandfather (or consider
other factors), else don't.

Nico
-- 


More information about the Idna-update mailing list