prohibiting previously mapped and unmapped characters

Harald Alvestrand harald at alvestrand.no
Wed Nov 29 20:21:56 CET 2006



--On 29. november 2006 09:42 -0800 Erik van der Poel <erikv at google.com> 
wrote:

> If it would help, I can take a look at Google's copies of web
> documents to see which characters are actually used there and how many
> occurrences there are of each. Of course, such a sample would omit
> domain names used in email, but the web is quite an important part of
> the Internet too.

I think such a listing (frequency count of characters actually used in 
Punycoded domains that actually serve web pages) would be very interesting.
For the characters that *never* occur, it seems hard to argue that a large 
community of present users would be hurt by their omission.

While you're at it, perhaps you could get a count of how many xn-- domains 
there are out there, as a percentage of the total number of domains for 
which Google fetches web pages?

I *love* statistics :-)

                   Harald



More information about the Idna-update mailing list