If there wasn't a gain, why would they process or index them in the first place? If you don't want to be responsible for the data you process, don't process it.
Because classifying huge amounts of content is highly non-trivial. It's not like there are a million employees sifting through every spidered resource and manually deciding whether it should be indexed.