Can do. A lot of the magic happens in the es/indexer and search lambdas here: ht...

Felz · on Sept 24, 2019

Seems like it'd be more elegant (and probably cost effective) if you stored the Lucene indexes inside the buckets themselves.

akarve · on Sept 24, 2019

That is an interesting idea. What kind of performance could we expect, especially in the federated case of searching multiple buckets? Elastic has sub-second latency (at the cost of running dedicated containers).

Felz · on Sept 25, 2019

That's a bit of an open question right now, unfortunately. Using S3 to store Lucene indexes is a roll-your-own thing since last I checked, and the implementation I wrote currently deals with smaller indexes where files can be pulled to fully disk as needed. S3 does support range requests, which I'd think would mimic random access well enough.

Assuming whatever ElasticSearch implementation you're using is backed by SSDs there'd likely be more latency with S3, but I'd expect it to scale pretty well. Internally, a Lucene index is an array of immutable self-contained segment files that store all indices for particular documents. Searching in multiple indices is pretty much just searching through all their segments- which can be as parallel as you want it to be.

To be honest, I'm actually surprised the Elasticsearch company doesn't offer this as an option. Maybe because they sell hardware at markup?