I have a client that is exploring Datomic, so I wonder if some of you can chime in on why this is popular at the moment and what your experiences are with it?
I'm a big Rich Hickey fan. If you don't know who he is, he's the guy behind Clojure and Datomic. I don't use those tools, but his views on simplicity are wonderful.
Here's a great quote of his on the subject:
"Simplicity is hard work. But, there's a huge payoff. The person who has a genuinely simpler system - a system made out of genuinely simple parts, is going to be able to affect the greatest change with the least work. He's going to kick your ass. He's gonna spend more time simplifying things up front and in the long haul he's gonna wipe the plate with you because he'll have that ability to change things when you're struggling to push elephants around."
I love datomic. It's a relational, ACID, transactional, non-SQL database.
The upsides:
SQL is a horrible language, yet all other noSQL DB also throw away the relational, transactional and ACID features that are great in postgres. Postgres with datalog syntax would basically be a win by itself. Datomic queries are data, not strings. Queries can be composed without string munging, and with clear understanding of what that will do to the query planner.
The schema has built-in support for has-one, has-many relationships, so there's no need for join tables.
I've never met a SQL query planner that didn't get in the way at some point. If needed, you can bypass the query planner, and get raw access to the data, and write your own query.
You can run an instance of it in-memory, which is fantastic for unit tests, so you don't have Postgres in production, but SQLite when testing.
The downsides:
It's closed source.
Operationally, it's unique. Because it uses immutable data everywhere, its indexing strategy is different. I don't have the experience of what it will do under high load.
The schema is 'weaker' than say, postgres. While you can specify "this column is type Int", you don't have the full power of Postgres constraints, so you can't declare 'column foo is required on all entities of this type', or "if foo is present, bar must not be present", etc. It should be possible to add that using a transactor library, but I don't think anyone has done serious work in that direction yet.
Datomic doesn't seem to have had a huge amount of marketing: it's been spreading largely by word of mouth, so a slow build-up makes sense.
It does bring an exceptionally elegant design (well worth reading Nikita Prokopov's "Unofficial guide" if you're curious). Also, the time and transaction-annotation features are unmatched AFAICT -- if you're working with complex data where provenance matters, Datomic can save a HUGE amount of work building tracking systems.
I was very interested, but pretty disappointed that Datomic is completely closed source. Maybe this is a little mean, but what could be more "simple" than being able to read, understand, and modify the database you rely on?
Neo4j, though marketed differently, is a similar approach (but the Community version is GPLv3 and Enterprise is AGPLv3). The Cypher query language is declarative in a similar way to Datomic - the biggest missing feature is transactions.
Rich Hickey has been criticized for that repeatedly. When asked, he's been transparent that Datomic is closed source so that he can put his kids through college. He also points out that he already gave us the whole Clojure language open source.
It's hard for me not to sympathize with him on this.
For sure, I would have played around with it, if it was open source and free to some small number of clients. But with so many FOSS databases, why use Datomic?
We're using datomic in production. It's had its ups and downs. For one, having raw data available at in-memory speeds really changes the level of expressiveness you have in your code; you no longer are constrained to packing every question about your data into a giant query and sending it off - you can instead pull data naturally and as needed. Many of our queries make multiple queries and are high performance.
The licensing is a huge pain in the ass. If I accidentally launch an extra peer over our license limit, our production environment will stop working until the extra peer comes down. This is really butting heads with the growing popularity of abstracting physical servers as clusters so I think the strategy is kind of a mistake on cognitect's behalf.
Part of me wonders why they don't open source datomic and crank up the marketing effort on the consultancy and datomic/clojure/etc support portion of the business. It seems like a much more effective model for DB companies. For direct revenue streams, they can always have tuned/monitored clusters packaged as appliances.
I can't help but feel the quote ultimately embodies a false belief. Simplicity doesn't build you a rocket that can get to the outer solar system. Understanding and experimentation does.
Sure, this was probably built up using simple experiments and designs. But consider the Mar's landing[1]. Simplicity would be to have a single mechanism for landing the Curiosity. Not 3. With one of them being a crane drop from a hovering rocket!?
I do feel there is an argument to up front simplicity. However, as systems grow, expect that the simplicity will be harder and harder to maintain and keep such requirements as performance met. To the point that it becomes a genuine tradeoff that has your standard cost/benefit analysis.
In the end, this falls to the trap of examples. If you are allowed to remove all assumptions from real use down to only a simple problem, you can get a simple solution. Add back in the realities of the problem, and the solution can get complex again. It is a shame that, in studies, so few real programs are actually looked at.
You should watch the talk(s), as your analysis here is entirely missing the context. What you’re talking about is what Rich Hickey and Stu Halloway call “complicated”, which is different from what they call “complex”.
I've seen them. They are nice and very alluring. So are a lot of false things. :) And I should note that I am mainly asserting this as false so that I can further explore the idea.
The idea to generate a new word that is hard to blur from existing ones and depends entirely on context is amusing in this context.
That is, what separates complicated from complex is one of context. Yet... contexts change. And often the first thing you do when building a solution to a problem is to reduce the problem to something easier to solve.
In this angle, I fully agree. Simplify your problem as much as you can. But do not be misled into thinking you can keep it simplified. As you add in more and more of the realities of the problem, they will reflect in the solution. And, often, the worst thing you can do is to try and cling to the "simple" solution that solved a different problem.
That is, understand the simple things well. See how they map onto the complicated things. Don't cling to the idea that they can be merely composed into the complicated solution. Often, several simple solutions can be subsumed by a more complicated one. Much in the same way that higher math can subsume lower maths.
> Simplicity would be to have a single mechanism for landing the Curiosity. Not 3. With one of them being a crane drop from a hovering rocket!?
Why? Simple, in the way Rich Hickey advocates, means the opposite of complex, which means that things are woven together. You can have many landing strategies without them being tightly coupled together. A huge system isn't necessarily complex.
That is the catch, all three landing strategies were coupled together. You couldn't do one without the one before it. More, previous steps had to take into account the baggage (literal) that was necessary to perform later steps.
If that's the best they could do and what got the job done, good. It's as simple as was possible and necessary. What exactly does this prove against simplicity, again?
The difference between "simple" and "as simple as possible" is the crux.
Mainly, the problem is that these speeches all talk about keeping things simple. In many problems, this can't be done. Understanding the simple helps. But the actual solution will not be simple. So any newspeak to get around that is just annoying.
A simple system can solve complicated things. When Rich Hickey talks about simple, he is referring to tight coupling, "death by specificity" and hard to understand concurrency. Having a system that does multiple things, isn't necessarily a complicated system. A Mars landing, which in itself is a difficult (though not necessarily complex) problem, can be solved by a simple system. An example of this is Unix. A simple system that does complicated things.
I thought you were speaking about different strategies, but in this case you're describing three different stages of an overall landing strategy. That doesn't sound complex.
Query optimization is difficult because of the abstract structure and limited indexes. So you may query an index that holds EVERYTHING, and doing the query backwards would be faster... This'll depend purely on what you've inserted to the DB up to this point. Or more-so how you insert things into the DB.
Don't run an SQL server as your KV store you'll likely screw up the config and performance will suffer. If you want competitive performance with other DB's you will likely end up running memcache between your KV and Query Engine(s).
Don't store data over 1KB. Yes, the database can technically handle them, but in real world applications and expected speeds it can't.
B-Tree Syncs can be slower then you think in surprising number of cases.
I think Datomic is a very interesting project for a variety of reasons like use of Datalog, immutability etc.
This particular thing seems like a backwards step though. One of the major reasons why relational databases became so popular was that: the user didn't have to think about which predicate to put first in the where clause, or which join to do first. Query optimizers can do that much better, and there is decades of research on how to optimizer relational queries (including Datalog queries -- see Deductive Databases). It is hard for users to make such decisions, and complex queries, views, runtime parameters make it near impossible to reason about performance of different queries.
One thing to remember in high write environments: if you're storing a uuid attribute and it's indexed, use datomics squuid (sequential uuid). Rewriting the indexes will go MUCH faster and prevent latency spikes on queries.
I've been using it for years in production now. Hell, there was a strangeloop talk about our use of it. We're using Dynamo for storage.
I found working with Datomic really nice. I like how I can express queries using clojure. It also has some great performance for reads and the data cache cuts down on those reads from Dynamo, which keeps the AWS bill down.
I did not like the way schema worked in the beginning. You had to make sure you had it right from a very early stage of development and that can be difficult if requirements change, but they've addressed that in later versions. It's also probably the most complicated part of our infrastructure, which is 100% run on AWS. You've got 2 transactors (high availability) and X peers, you need to deploy a new version of Datomic on them. Coordinating that while minimizing downtime is no simple task.
It's an interesting database. Regarding closed source complaint, it's part of a false dilemma that keeps repeating unnecessarily: that one must choose open + free or closed + paid. Nonsense! You can have open source and proprietary licensing simultaneously. You can even let paying users extend it as Burroughs did for MCP OS in 60's. I go into more detail here [1] on various models of source sharing and security review implications (my focus).
So, he could patent any key technology, publish the implementation with copyright protection, give source/binaries to customers on condition they keep paying, let users extend it for internal use, and even let users submit such improvements for others to use.. His company continues to make money on the licensing in each of these cases. All of this has been done before. If anything, the real risk is on the users that the source license might change like what happen with QNX. It's why I advocate perpetual licenses for a given release at a given rate which are re-issued each year a client pays.
I'm a big Rich Hickey fan. If you don't know who he is, he's the guy behind Clojure and Datomic. I don't use those tools, but his views on simplicity are wonderful.
Here's a great quote of his on the subject:
"Simplicity is hard work. But, there's a huge payoff. The person who has a genuinely simpler system - a system made out of genuinely simple parts, is going to be able to affect the greatest change with the least work. He's going to kick your ass. He's gonna spend more time simplifying things up front and in the long haul he's gonna wipe the plate with you because he'll have that ability to change things when you're struggling to push elephants around."
Here's his classic talk on simplicity if you haven't seen it yet: http://www.infoq.com/presentations/Simple-Made-Easy