The real issue: interopability, and a proposal (Was: Consensus Call on Latin Sharp S and Greek Final Sigma)

Harald Alvestrand harald at alvestrand.no
Tue Dec 1 20:00:43 CET 2009


Mark Davis ☕ wrote:
>
> As far as Harald's back-of-the-envelope calculations go, they present 
> a very inaccurate picture of the scale. Here are some more exact 
> figures for that data.
>
>    1. 819,600,672    = sample size of documents
>    2. 5,000    = links with eszed in the sample
>    3. 1,000,000,000,000    = total documents in index (2008)
>    4. 1,220    = scaling factor (= total docs / sample size)
>    5. 6,100,532    = estimated total links with eszed (= scaling *
>       sample eszed links)
>
> Even this has to be taken with a certain grain of salt, since (a) it 
> is assuming that the sample is representative (although we have 
> reasonable confidence in that), and (b) it doesn't weight the 
> "importance" of the links (in terms of the number of times they are 
> followed), and (c) this data was collected back in Nov 2008, so we've 
> had another year of growth since then.
I obviously need a bigger envelope :-) - I didn't think we had one 
trillion documents in the 2008 index.

One missing number: how many links per document?

Obviously #eszed links / #documents can't be the basis of the 0.00001% 
figure that Erik quoted, because 5000/819600672 = 0.00061005%, not 
0.00001%, which is a factor of 60 larger, but if we estimate 60 links 
per document, the 0.00001% fits nicely as the percentage of links that 
contain eszed.

               Harald





More information about the Idna-update mailing list