ELK stack is unrelated, this is the analytical/OLAP use-case not the logging stack stuff. Although you can say that logging is a specific use-case of the analytical/OLAP.
Search for the discussion around "facets".
Basically the way to think of this is, elastic/solr are MapReduce with weak database semantics layered on top. The ingest stage is the "map", you ingest JSON documents and optionally perform some transformation on them. The query is the "reduce", and you are performing some analytical summation or search on top of this.
The point of having the "map" step is that on ingest, you can augment the document with various things like tokenizations or transforms, and then write those into the stored representation, like a generated column. This happens outside your relational DB and thus doesn't incur RDBMS storage space for the transformed (and duplicated!) data, or the generation expense at query time. Pushing that stuff inside your relational store is wasteful.
It's an OLAP store. You have your online/OLTP store, you extract your OLAP measurements as a JSON document and send it to the OLAP store. The OLAP store performs some additional augmentations, then searches on it.
What facets let you do is have "groups within groups". So you have company with multiple employees inside it, you could do "select * from sales where company = 'dell' group by product facet by employee". But it's actually a whole separate layer that runs on top of grouping (and actually behaves slightly differently!), and because of the way this is done (inverted indexes) this is actually incredibly efficient for many many groups.
It's built on a "full-scan every time" idiom rather than maintaining indexes etc... but it actually makes full-scans super performant, if you stay on the happy path. And because you're full-scanning every time, you can actually build the information an index would have contained as you go... hence "inverted index". You are collecting pointers to the rows that fit a particular filter, like an index, and aggregating on them as it's built.
And the really clever bit is that it can actually build little trees of these counts, representing the different slices through OLAP hyperspace you could take when navigating your object's dimensions. So you can know that if you filter by employee there are 8 different types of products, and if we filter by "EUV tool" then there is only 1 company in the category. Which is helpful for hinting that sidebar product navigation tree in stores etc.
A lot of times if you look carefully at the HTML/JS you can see evidence of this, it will often explicitly mention facets and sometimes expose Lucene syntax directly.
This is a very good explanation of what we've found keeps users on Elastic: The combination of FTS and fast facets/aggregates in the same query. At ParadeDB, we've implemented support this in what we call aggregations: https://docs.paradedb.com/search/full-text/aggregations
Search for the discussion around "facets".
Basically the way to think of this is, elastic/solr are MapReduce with weak database semantics layered on top. The ingest stage is the "map", you ingest JSON documents and optionally perform some transformation on them. The query is the "reduce", and you are performing some analytical summation or search on top of this.
The point of having the "map" step is that on ingest, you can augment the document with various things like tokenizations or transforms, and then write those into the stored representation, like a generated column. This happens outside your relational DB and thus doesn't incur RDBMS storage space for the transformed (and duplicated!) data, or the generation expense at query time. Pushing that stuff inside your relational store is wasteful.
It's an OLAP store. You have your online/OLTP store, you extract your OLAP measurements as a JSON document and send it to the OLAP store. The OLAP store performs some additional augmentations, then searches on it.
What facets let you do is have "groups within groups". So you have company with multiple employees inside it, you could do "select * from sales where company = 'dell' group by product facet by employee". But it's actually a whole separate layer that runs on top of grouping (and actually behaves slightly differently!), and because of the way this is done (inverted indexes) this is actually incredibly efficient for many many groups.
It's built on a "full-scan every time" idiom rather than maintaining indexes etc... but it actually makes full-scans super performant, if you stay on the happy path. And because you're full-scanning every time, you can actually build the information an index would have contained as you go... hence "inverted index". You are collecting pointers to the rows that fit a particular filter, like an index, and aggregating on them as it's built.
And the really clever bit is that it can actually build little trees of these counts, representing the different slices through OLAP hyperspace you could take when navigating your object's dimensions. So you can know that if you filter by employee there are 8 different types of products, and if we filter by "EUV tool" then there is only 1 company in the category. Which is helpful for hinting that sidebar product navigation tree in stores etc.
A lot of times if you look carefully at the HTML/JS you can see evidence of this, it will often explicitly mention facets and sometimes expose Lucene syntax directly.