The real issue: interopability, and a proposal (Was: Consensus Call on Latin Sharp S and Greek Final Sigma)
Harald Alvestrand
harald at alvestrand.no
Tue Dec 1 20:00:43 CET 2009
Mark Davis ☕ wrote:
>
> As far as Harald's back-of-the-envelope calculations go, they present
> a very inaccurate picture of the scale. Here are some more exact
> figures for that data.
>
> 1. 819,600,672 = sample size of documents
> 2. 5,000 = links with eszed in the sample
> 3. 1,000,000,000,000 = total documents in index (2008)
> 4. 1,220 = scaling factor (= total docs / sample size)
> 5. 6,100,532 = estimated total links with eszed (= scaling *
> sample eszed links)
>
> Even this has to be taken with a certain grain of salt, since (a) it
> is assuming that the sample is representative (although we have
> reasonable confidence in that), and (b) it doesn't weight the
> "importance" of the links (in terms of the number of times they are
> followed), and (c) this data was collected back in Nov 2008, so we've
> had another year of growth since then.
I obviously need a bigger envelope :-) - I didn't think we had one
trillion documents in the 2008 index.
One missing number: how many links per document?
Obviously #eszed links / #documents can't be the basis of the 0.00001%
figure that Erik quoted, because 5000/819600672 = 0.00061005%, not
0.00001%, which is a factor of 60 larger, but if we estimate 60 links
per document, the 0.00001% fits nicely as the percentage of links that
contain eszed.
Harald
More information about the Idna-update
mailing list