prohibiting previously mapped and unmapped characters
Harald Alvestrand
harald at alvestrand.no
Wed Nov 29 20:21:56 CET 2006
--On 29. november 2006 09:42 -0800 Erik van der Poel <erikv at google.com>
wrote:
> If it would help, I can take a look at Google's copies of web
> documents to see which characters are actually used there and how many
> occurrences there are of each. Of course, such a sample would omit
> domain names used in email, but the web is quite an important part of
> the Internet too.
I think such a listing (frequency count of characters actually used in
Punycoded domains that actually serve web pages) would be very interesting.
For the characters that *never* occur, it seems hard to argue that a large
community of present users would be hurt by their omission.
While you're at it, perhaps you could get a count of how many xn-- domains
there are out there, as a percentage of the total number of domains for
which Google fetches web pages?
I *love* statistics :-)
Harald
More information about the Idna-update
mailing list