As of aug 16, common crawl has 1.73n pages. For the complimentary set of urls, i...

		rshm on Sept 13, 2016 \| parent \| context \| favorite \| on: Show HN: Open-source search engine with 2bn-page i... As of aug 16, common crawl has 1.73n pages. For the complimentary set of urls, if any benefit you can use their data dump as seed. If the metadata (such as last modified) size of your index is small enough to upload to aws, you can also reduce your re-crawl efforts when they have a fresh release.

It doesn't have to be small to donate to Common Crawl, they have a free S3 bucket.