We had similarly terrible experiences with AWS ES, so we moved to self-hosted Elastic. It's better but still pricey and requires more manhours dedicated to tuning.
Their move away from open-source has been unfortunate. For that and some other reasons we've ended up more impressed with Logz.io and Splunk SaaS.
What kind of tuning do you do? We spend time but typically on upgrades… very rarely have we spent much time on tuning I’m curious what kind of tuning are people doing beyond mlock and having enough memory / nodes?
I mean, logz.io is okay and all, but it's quite expensive, and has a loooot of outages. I don't remember a week where we didn't have issues, the most common being log ingestion lag, which take hours to fix. That, and their API is bad: you can only query over a 2 day period and their API keys allow full control over everything.
Edit: As you said, there may be reasons on the backend not to filter things out of the query. Though it seems likely that the web response could be trimmed down.
I never meet him, but apparently he repeat this kind of advice a lot, and someone of the first batches of YC made an image of a fake action figure of pg that repeats the advice when you press a button. I can't find the link :( .
Yes, you are, if you're using the site he created (not to fault you for not knowing but often many HNers will reference Paul Graham by his initials so it's useful to know).
Their move away from open-source has been unfortunate. For that and some other reasons we've ended up more impressed with Logz.io and Splunk SaaS.