Not logged in [ Register for account ] [ Login ]  
Cornell University

The Web Laboratory: Crawl Sizes Statistics

This is a partial list of the the web crawls in the collection of the Internet Archive and the progress in downloading them to Cornell.  All sizes are in terabytes (TB)

Web pages are stored in compressed files, known as ARC files, with associated metadata files, known as DAT files.  Many web pages are stored in a single ARC file.  The columns "Total Files" are a count of the ARC and DAT files in the corresponding collections.

The two final columns give the order of priority in downloading the crawls to Cornell and the completed dates.


Crawl Name Time Period Total Files per
Internet Archive
Total ARC Size Total DAT Size Total Crawl Size Total Files per
Crawl as Recorded
Download Order Completed Date
DDFeb-April 2001124,4331.90.3062.2187,5385th unknown
DEMarch-June 200155,8094.30.4194.795,328n/a 
DFMay-August 200197,4726.80.4427.2148,402n/a 
DGJuly-Oct 200197,3046.20.3246.5134,872n/a 
DHSept-Dec 200160,4853.90.2514.184,286n/a 
DINov 2001-Feb 2002130,9667.50.4858162,130n/a 
DJJan-April 2002161,7349.20.6029.8204,5602nd06/11/2006
DKMarch-June 2002124,0795.50.3615.9120,622n/a 
DLJan-July 2002205,09011.20.66111.9249,568n/a 
DMJuly-Oct 200280,60110.10.6610.8220,002n/a 
DNSept-Dec 2002185,00611.50.81412.3250,614n/a 
DONov 2002-Feb 2003232,66613.30.79314.1289,636n/a 
DPJan-April 2003187,13012.70.71413.4276,1123rd07/03/2007
DQMarch-May 200387,35324.11.325.4522,466n/a 
DRMay-August 200328,13120.71.221.9448,582n/a 
DSJuly-Oct 200338,02924.71.426.1535,780n/a 
DTSept-Dec 200349,24621.41.322.7464,550n/a 
DUNov 2003-Feb 200486,85924.61.426.1533,316n/a 
DVJan-April 2004126,26623.71.925.6515,2604thunknown
DWMarch-June 200472,050262.228.2563,766n/a 
DXMay-August 2004662,52539.53.643.1856,164n/a 
DYJuly-Oct 2004714,041353.838.7761,410n/a 
DZSept-Dec 2004681,23732.33.335.5705,370n/a 
EANov 2004-Feb 2005766,22036.23.840791,934n/a 
EBJan-March 2005392,87518.12.220.3393,7601stunknown
ECMarch-April 2005304,66013.81.815.6301,268n/a 
EDMarch-August 2005714,12035.64.339.9769,948n/a 
EEn/a013.51.715.2309,110n/a 
EFn/a026.61.828.4617,400n/a 
EGn/a0462.248.21,020,064n/a 

 

Legend: