Solr and Elasticsearch are not relational database systems and have none of the ...

vesak · on July 29, 2018

A rather small portion of software that requires search requires a relational database, though.

manigandham · on July 29, 2018

It's not just relational features that matter but the decades of work on reliability, performance, flexibility, security, and general usability of these databases that makes them so great.

ElasticSearch doesn't have ACID or very good OLTP manners. More importantly, it still continues to have data loss issues so it's not a good fit as a primary datastore.

smittywerben · on July 29, 2018

ElasticSearch looks like a REST api with http/json schema to pipe directly to browsers. Why would you need to store the results of ElasticSearch?

manigandham · on July 29, 2018

It's not about results, the data itself within ElasticSearch is not guaranteed safe as a primary datastore, so you should have a reliable database for the primary data that is then synced to ES.

aisofteng · on July 29, 2018

As someone who has used ElasticSearch in production for years, as far as I know this is simply not true.

manigandham · on July 29, 2018

That's good for you, but it sounds like you're extrapolating it on just your experience rather than research? Perhaps you've been lucky? Do you really have the info to objectively state it as "simply not true"?

Read the jepsen tests (old but show core problems that aren't fixed):

https://aphyr.com/posts/317-call-me-maybe-elasticsearch

https://aphyr.com/posts/323-call-me-maybe-elasticsearch-1-5-...

Or just search for ES data loss:

https://news.ycombinator.com/item?id=9475620

https://stackoverflow.com/questions/29841348/how-reliable-is...

And if that's not enough, here's the official page listing several "ongoing" fixes for major durability issues:

https://www.elastic.co/guide/en/elasticsearch/resiliency/cur...

aisofteng · on July 30, 2018

After reading the links you gave, I’d like to clarify that I meant that edge cases that can cause data loss are not common enough, in my experience, for one to be forced to treat ElasticSearch as an unreliable data store. Anecdotally, I’ve never seen it happen. You’re right, maybe I’m lucky; I would like to see a case study of some project where these issues were significant enough to treat ElasticSearch as an unreliable data store.

smittywerben · on July 29, 2018

I don't understand the syncing. Why not write-only akin to UDP protocol?

manigandham · on July 29, 2018

What are you talking about?

purerandomness · on July 29, 2018

Can you name an example?

I've never built something that did not need a relational database to model the domain, and search is such a common UX element that it's taken for granted in everything you use.

I can't even think of something that could use search that's not relational. Can you?

vesak · on July 30, 2018

> I've never built something that did not need a relational database to model the domain

Yeah, so perhaps you don't have much experience from outside the relational model. Makes perfect sense to see a single truth then.

> I can't even think of something that could use search that's not relational. Can you?

Not sure how wide a blanket you're throwing by "not relational", but here are a couple examples that might fit: Google's search engine, most filesystems and their files, hierarchical and flat data structures in general, twitter.

threeseed · on July 29, 2018

Yes plenty.

Elasticsearch stores data in a JSON document store style format. So any situation where you need O(1) lookup for all data related to a particular entity you can have nested structures within JSON that facilitates this. In a RDBMS it is O(n) since you have to do a costly join for each embedded structure.

360 customer views are good example of this.

manigandham · on July 30, 2018

A document store is not a replacement for a relational database, and there are few cases where it's truly a better fit.

More importantly, ES is not a document store. It is a search system that can be used as a similarity index between all kinds of data, most commonly json docs but can be anything like images or audio waveforms.

ES can return none of the source data or not even store it in the first place. In addition, it lacks ACID and transactions and generally has an overall poor reputation for data integrity.

If you just want a json blob, you can already use a relational database column for it and get the key/value performance semantics, while still retaining all the reliability and manageability properties.

purerandomness · on July 29, 2018

There's a lot of misconceptions there.

If you want to save JSON documents, you can pick Postgres [1] or MySQL and still have a RDBMS for free additionally.

Elasticsearch does that too, but its access certainly isn't O(1) - it has a complexity of O(log n), just like Postgres or MySQL would have, because all of them use tree-like index data structures. (Please note that ElasticSearch is not meant as a data storage layer)

Now, if you just want to have the JSON, you're done. O(log n) for ElasticSearch, Postgres or MySQL. That's it.

But normally, you want to do something with that data. You have a user and want all the products. Or you have a book and want all the authors, because you have a real app to write.

Most data in our world is relational: A student can partitipate in courses, a store sells products, and so on. Queriyng your database using JOINs will make use of indexes in each of the tables that you have previously set up.

These indexes will prevent the theoretical worst-case (the "Full Table Scan", where your complexity is O(n) for the table). The query planner will decide which index is better to use for the optimal response time.

You can, however, choose to not leverage this feature of RDBMS. What do you do now, if you need all the courses of a student? Do you get all the JSONs and compare their IDs? Do you parse the JSON keys in your own app? But having your app iterate over all the JSON keys already has O(n) complxity. So you end up recreating tree-like indexes anyway somewhere in your stack. But all of that is already provided for you, with decades of debugging and performance optimization - for free.

To sum up: Use Postgres JSONB type if you only need to write JSON documents. You might later want to query the nested JSON data structure and join it. I highly recommend "Mastering PostgreSQL in Application Development" [2]

[1] https://www.postgresql.org/docs/current/static/datatype-json... [2] https://masteringpostgresql.com/

threeseed · on July 29, 2018

You have missed the point here.

The comparison was between ElasticSearch and a relational store. PostgreSQL's JSON type is not relational. And if you are fetching any entity by a primary key then it should be O(1). The point was that document stores offer you the ability to nest data structures whilst relational forces you to join. And so if that is your query/data pattern then a document store can be orders of magnitude faster than a purely relational store e.g. 360 Customer View.

purerandomness · on July 30, 2018

> O(1)

Primary key fetching will always be O(log n) due to how indexes are implemented. Access time complexity will never be independent of the item count in your data store, relational or not.

> PostgreSQL's JSON type is not relational.

It is. In Postgres, you can join nested JSON structures' keys with normal, relational tables: https://dba.stackexchange.com/questions/83932/postgresql-joi...

My point is: Postgres offers you a document store and gives you the option to join, if you feel like it.

lowercased · on July 30, 2018

There's not many use cases I come across where the 'nested' data is the only instance of that data. A student record might have some nested class information, but the class information itself lives as a master someplace else. If you nest things, you now have data living in multiple places, no? Updates need to update multiple locations, which would potentially introduce lots of errors.

I've got a handful of situations where nested structures are making sense, but it doesn't seem to be the norm (at least in the data worlds I work in).

eska · on July 29, 2018

There was a good article about that on here which you may find interesting http://blog.memsql.com/nosql