Unicode 5.2 -> 6.0
Nicolas Williams
Nicolas.Williams at oracle.com
Thu Oct 14 23:14:49 CEST 2010
On Thu, Oct 14, 2010 at 04:51:26PM -0400, Andrew Sullivan wrote:
> On Thu, Oct 14, 2010 at 04:41:18PM -0400, John C Klensin wrote:
> > extent of the tree. The meaning of "registry" in IDNA2008 (and
> > similar terminology in IDNA2003) extends to every zone
> > administrator of a zone in the DNS. In principle, the
> > administrator of a subdomain somewhere deep in the DNS tree
> > could have chosen to utilize one or more of these characters
> > without having to discuss that decision with anyone else.
>
> [...]
>
> If we were much further along and had any evidence at all of
> widespread use, I'd be pretty concerned. As it is, it seems to me the
> best we can do for this particular case is ask everyone we know, and
> hope hard we get it right. But at least we're in quite early days.
> We won't have this luxury when we go from Unicode 6.x to 7.x, I
> suppose.
And if we get it wrong?
Note: I can't find a glyph for U+19CA. That may be dispositive. If it
can't be rendered in any font, then it can't be used in a domainname
label.
> I dimly recall taking minutes in one of the IDNABIS meetings in which
> I formed the impression that people thought it unlikely stuff would
> move from PVALID to DISALLOWED. I guess my impression was wrong? For
> if this is going to be a regular problem in future, it seems like one
> would be better to have some new class like PROBABLY-PVALID where
> characters we're not sure about live for a couple releases of the
> Unicode tables. That feels like second-guessing Unicode, however, and
> we were trying to get out of that game.
We have to accept that Unicode will make this sort of backwards-
incompatible change made to it from time to time. I'm not sure what we
can do about it. I doubt that we can reliably judge probability of such
events on a per-codepoint/character basis -- if we could, then the
Unicode Consortium could too, and then they could warn us ahead of time.
So we have to deal with the question of what to do with
PVALID->DISALLOWED transitions.
Knowing that this character is not in use in any TLDs' zones would be
nice: presumably it will be much easier for admins below to fix their
zones than to get existing commercial (i.e., someone paid a registrar)
registrations changed. But we won't always have that luxury.
Now, this is what the UC has to say about this character:
"A general category change to one New Tai Lue numeric character
(U+19CA), which would have the effect of disqualifying it from inclusion
in identifiers unless grandfathering measures are in place for the
defining identifier syntax "
IOW: we're allowed to grandfather U+19CA.
The test I propose then is: if we can find a font that can render
once-PVALID-now-DISALLOWED characters, then grandfather (or consider
other factors), else don't.
Nico
--
More information about the Idna-update
mailing list