The real issue: interopability, and a proposal (Was: Consensus Call on Latin Sharp S and Greek Final Sigma)

Mark Davis ☕ mark at macchiato.com
Tue Dec 1 20:49:00 CET 2009


It is approximately 60, as you computed. The trillion figure was in a public
posting from July 2008, which is why we can quote it.

Mark


2009/12/1 Harald Alvestrand <harald at alvestrand.no>

> Mark Davis ☕ wrote:
>
>>
>> As far as Harald's back-of-the-envelope calculations go, they present a
>> very inaccurate picture of the scale. Here are some more exact figures for
>> that data.
>>
>>   1. 819,600,672    = sample size of documents
>>   2. 5,000    = links with eszed in the sample
>>   3. 1,000,000,000,000    = total documents in index (2008)
>>   4. 1,220    = scaling factor (= total docs / sample size)
>>   5. 6,100,532    = estimated total links with eszed (= scaling *
>>
>>      sample eszed links)
>>
>> Even this has to be taken with a certain grain of salt, since (a) it is
>> assuming that the sample is representative (although we have reasonable
>> confidence in that), and (b) it doesn't weight the "importance" of the links
>> (in terms of the number of times they are followed), and (c) this data was
>> collected back in Nov 2008, so we've had another year of growth since then.
>>
> I obviously need a bigger envelope :-) - I didn't think we had one trillion
> documents in the 2008 index.
>
> One missing number: how many links per document?
>
> Obviously #eszed links / #documents can't be the basis of the 0.00001%
> figure that Erik quoted, because 5000/819600672 = 0.00061005%, not 0.00001%,
> which is a factor of 60 larger, but if we estimate 60 links per document,
> the 0.00001% fits nicely as the percentage of links that contain eszed.
>
>              Harald
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20091201/1d65b743/attachment.htm 


More information about the Idna-update mailing list