The Web Laboratory: Database Sizes Statistics
The Web Lab database contains metadata about web collections. Some of the collections are crawls that have been downloaded from the Internet Archive. They are identified by two letter codes, e.g., DJ. Other collections are special collections used by research projects.
This page gives summary data about several of the principal collections. See the Status page for more detailed profiles of some of these collections.
| Crawl Name | Database Size | # Pages | # Links | # Urls | # Hosts |
|---|---|---|---|---|---|
| Amazon | 0.56 TB | 39,017,248 | 2,954,146,695 | 34,884,739 | 356 |
| Cornell | 0.005 TB | 793,140 | 11,889,778 | 756,341 | 40,964 |
| DJ | 2.7 TB | 1,140,839,475 | 26,244,734,149 | 904,946,380 | 16,089,901 |
| DP | 6.5 TB | 1,785,298,634 | 45,740,376,329 | 1,390,553,968 | 39,884,497 |
| DV | 17.7 TB | 2,638,752,713 | 111,772,592,303 | 2,448,549,442 | 80,154,600 |
| EB | 20 TB | 2,851,741,704 | 129,591,958,950 | 20,147,845,829 | 380,188,095 |
