prohibiting previously mapped and unmapped characters

Greg Aaron gaaron at
Wed Nov 29 22:19:55 CET 2006

Quantifying the scope of the issue is a good idea.  However, the number of
live IDN Web sites is not the most pertinent metric, and it will not tell
you which characters are never used.  More important is how many (and which)
IDNs have been registered and are currently in domain name registries.

A significant percentage of domain names do not resolve.  In the various
gTLDs, that percentage is 24% and up.  An even higher percentage of IDNs do
not resolve, because IDNs are still catching on, and because Internet
Explorer has not supported IDNs until very recently.  And even if a domain
does not resolve, the registrant can activate it at any time.

Also, the amount or type of content on a resolving domain is irrelevant.
All domain names are equal in that they have been paid for, they may be used
at the registrant's pleasure, and service has been promised by the registry
and registrar during that domain's registered lifetime.  (To use an example
from another industry with numerical identifiers: the phone company will not
take away your phone number just because you haven't called anyone

Here is another way to approach the problem.  The number of published IDN
tables is finite  < >.
Which collide with the possible areas of backwards-incompatability?

All best,
--Greg Aaron

-----Original Message-----
From: idna-update-bounces at
[mailto:idna-update-bounces at]On Behalf Of Harald Alvestrand
Sent: Wednesday, November 29, 2006 2:22 PM
To: Erik van der Poel; idna-update at
Subject: Re: prohibiting previously mapped and unmapped characters

--On 29. november 2006 09:42 -0800 Erik van der Poel <erikv at>

> If it would help, I can take a look at Google's copies of web
> documents to see which characters are actually used there and how many
> occurrences there are of each. Of course, such a sample would omit
> domain names used in email, but the web is quite an important part of
> the Internet too.

I think such a listing (frequency count of characters actually used in
Punycoded domains that actually serve web pages) would be very interesting.
For the characters that *never* occur, it seems hard to argue that a large
community of present users would be hurt by their omission.

While you're at it, perhaps you could get a count of how many xn-- domains
there are out there, as a percentage of the total number of domains for
which Google fetches web pages?

I *love* statistics :-)


Idna-update mailing list
Idna-update at

More information about the Idna-update mailing list