Having the full Common Crawl uncompressed in disk with some indexing and RAID-1 is becoming quite feasible (easily under $10k). The size of the text part of the web is growing a lot more slowly than the price of storage for maybe 10 years. At this rate anyone would be able to have a full crawl of the web in their mid-range desktop computers in just a few years. Here's hoping for good weather in Thailand. :-)
You need how much right now, 200TB tops? including index files, uncompressed data and replication. With some clever compression and data structures you can probably cut that by half or less. Financially speaking this is already on local club territory.
You need how much right now, 200TB tops? including index files, uncompressed data and replication. With some clever compression and data structures you can probably cut that by half or less. Financially speaking this is already on local club territory.