It is approximately 60, as you computed. The trillion figure was in a public posting from July 2008, which is why we can quote it.<br><br clear="all">Mark<br>
<br><br><div class="gmail_quote">2009/12/1 Harald Alvestrand <span dir="ltr"><<a href="mailto:harald@alvestrand.no">harald@alvestrand.no</a>></span><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Mark Davis ☕ wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="im">
<br>
As far as Harald's back-of-the-envelope calculations go, they present a very inaccurate picture of the scale. Here are some more exact figures for that data.<br>
<br></div>
1. 819,600,672 = sample size of documents<br>
2. 5,000 = links with eszed in the sample<br>
3. 1,000,000,000,000 = total documents in index (2008)<br>
4. 1,220 = scaling factor (= total docs / sample size)<br>
5. 6,100,532 = estimated total links with eszed (= scaling *<div class="im"><br>
sample eszed links)<br>
<br>
Even this has to be taken with a certain grain of salt, since (a) it is assuming that the sample is representative (although we have reasonable confidence in that), and (b) it doesn't weight the "importance" of the links (in terms of the number of times they are followed), and (c) this data was collected back in Nov 2008, so we've had another year of growth since then.<br>
</div></blockquote>
I obviously need a bigger envelope :-) - I didn't think we had one trillion documents in the 2008 index.<br>
<br>
One missing number: how many links per document?<br>
<br>
Obviously #eszed links / #documents can't be the basis of the 0.00001% figure that Erik quoted, because 5000/819600672 = 0.00061005%, not 0.00001%, which is a factor of 60 larger, but if we estimate 60 links per document, the 0.00001% fits nicely as the percentage of links that contain eszed.<br>
<font color="#888888">
<br>
Harald<br>
<br>
<br>
<br>
</font></blockquote></div><br>