"The search index"? Leaving aside any other problems that would cause, that sounds like a recipe for locking in specific formats and technologies forever, and preventing any further innovation.
A search index is not simply a function from search string to results. Treating it as such would oversimplify the potential value or future innovation of a search engine. A search index includes the data and metadata of all the pages indexed, plus derived data in various forms ranging from data structures to trained neural networks, plus sufficiently fast access to the data to make it possible to generate new such data structures or train new networks. And all that while continuing to index new pages as fast as possible.
A search index used internally by one company can change formats and technologies as long as the products using it have updated accordingly, and can provide many different facets of information. A search index used by multiple entities (likely competing entities) can't easily co-evolve with the search engines using it, can't easily change formats or technologies, can't easily discard older ones (and thus needs far more resources if it innovates), and has a much more difficult time providing previously unanticipated information.
Don't get me wrong, I'd love to see a good way to support innovation on the scale of a production search index without requiring billion-dollar infrastructure, but separating the search index from a search engine would substantially increase the complexity and expense of both, and seems quite likely to curtail innovation.
A search index is not simply a function from search string to results. Treating it as such would oversimplify the potential value or future innovation of a search engine. A search index includes the data and metadata of all the pages indexed, plus derived data in various forms ranging from data structures to trained neural networks, plus sufficiently fast access to the data to make it possible to generate new such data structures or train new networks. And all that while continuing to index new pages as fast as possible.
A search index used internally by one company can change formats and technologies as long as the products using it have updated accordingly, and can provide many different facets of information. A search index used by multiple entities (likely competing entities) can't easily co-evolve with the search engines using it, can't easily change formats or technologies, can't easily discard older ones (and thus needs far more resources if it innovates), and has a much more difficult time providing previously unanticipated information.
Don't get me wrong, I'd love to see a good way to support innovation on the scale of a production search index without requiring billion-dollar infrastructure, but separating the search index from a search engine would substantially increase the complexity and expense of both, and seems quite likely to curtail innovation.