TBH if you're looking to do something serious & consumer facing you may want to look into paying for Google's Custom Search API, or if your data set differs dramatically in shape from webpages, hiring a really good data scientist and rolling your own:
I'm an ex-Googler who's now working fairly extensively with ElasticSearch. While I've been impressed at how easy it is to get started, all the published academic ranking algorithms are utter crap compared to what Google has. Okapi BM25 is mentioned as the state-of-the-art up-thread; it was developed in the 80s, long before Google was born, and most of the advances in search since the 90s have happened behind closed corporate doors inside Google, Yahoo, and Bing.
This is solid advice. At my last company we had SOLR. I think we would have even qualified for what most people would call a "fair" use case of a search stack. Millions of items in the catalog, and millions of users to go along with it. Lucene is light years faster than a SQL engine with such requirements for search.
What is it not better at though? Pretty much everything else. GIS, Cross joins, parent child indexing, etc. Postgres can do it all too with a little digging.
The thing neither of them has? Semantic search. If I ever hit that scale again at another company, the jump won't be from SQL to Lucene, but from SQL to Google.
You can do semantic search with Solr and ES, it just isn't built-in. But it has support for stemming and synonyms which are the building blocks. You have create the synonyms file yourself with word2vec or a similar algorithm.
Has Google and/or Microsoft made significant advances in keyword relevance algorithms specifically? Google is a combination of keyword relevance + reputation (PageRank) + semantics (word2vec) + AI + other stuff when combined is the best search engine in world. Is the keyword relevance part really that much better than BM25?
This is a great suggestion. If site requires SEO why bother rolling another search engine. Everyone is just going to complain that it isn't google anyways.
Can you describe a use case where postgres performs badly, or why you prefer XYZ ?
In my experience, on commercial projects, using 9.4+ postgres GIN tag/keyword search index is very fast over data on the order of 30 to 100 million records, providing you have a host with SSD disks.
I call this size data "MID" data, not "BIG" data, and it seems applicable to a lot of startups before they get to a million customers, and for a lot of real world customer data use cases.
Also, a lot of startups need to handle gps geo-coordinates .. having used them, I have to say that the primitives that come with postGIS are pretty superb, and easily integrated.
Outside startup-land, there are a lot of companies using horrible data systems, dying to get to their data quickly, who could be much better served with postgres via a web front end. ( Lowering the cost/time to build that is something I'm working on )
So if :
* you want fast tag search
* your data has locations or nearby-x
* you'll have < 1bn rows for some time
* your data is worth money
then imo, postgres+postGIS+GIN+SSD is pretty compelling.
That covers a lot of startup and business surface area.
I don't think it's the number of users necessarily, it's about the expectations. People seem to expect Google quality search behind every search box on the web. The search needs to find the relevant information, first time, every time.
So it's more about being able to find what people are looking for, not the number of users you have. And to be fair, almost all search solution currently available will scale beyond what most of us need in terms for number of users or the size of our datasets.
For basic utility stuff or simple backend apps it's pretty useful. For anything serious or consumer facing it's crap.