Datomic is interesting because it's a different take on what a database should look like. The TLDR version by someone who's looked into it a bit but not actually used it:
* Storage, Transactions, and Querying are separated as in different processes/machines separated.
* Data is immutable. Storage is pluggable and has implementations on top of Dynamo/Riak.
* Transaction semantics and ordering are controlled by a single process for consistency. This is the write scaling caveat. It's less of a restriction than it sounds (if you're thinking SQLite2 like I did) because there aren't writes/queries competing for resources, it's just the sequencing.
* Queries on the db are performed in-client and can interoperate with client code and state. When you write a query, datomic pulls the data from storage to the local machine and performs the query.
* Queries are in a logic programming language called datalog. Even if you aren't interested in the rest, I'll recommend spending an hour working through http://learndatalogtoday.org/ just for the exposure to logic programming.
Only the range of data the client is interested in is fetched from the storage layer.
As for why they chose this, you'd have to ask them to be sure.
But two reasonable assumptions are: 1. they wanted the storage layer to be "dumb", in particular so that they could use existing services like Dynamo. 2. they wanted reading processes to be totally independent. Readers can talk directly to the dumb storage layer without any centralized resource coordinator to execute queries. That means horizontal scalability in the strict sense.
Only partial indexes are retrieved (what is needed to answer your exact query). The bonus is that that data is now local. Transversing deep structures then often approaches the speed of hash-map lookups. As someone who has worked on very complex SQL databases, this is a major win.
indexes and chunks of data that are used often remain cached in each application instance, and new changes are streamed to the application cache.
It means that reads from a hot cache do not touch network. Reads are very fast and scale "out". You can write code that does a lot of reads without caring much about performance. (SQL reads only scale "up" and you care very much about their performance.)
Datomic is like Git (distributed reads, central writes); Postgres is like CVS/SVN (centralized reads and writes). This is made possible by immutable history.
In traditional ACID databases (SQL), all queries (read and write) mostly only scale UP (beefier db machine), not OUT (lots of db machines is very hard). Datomic is an ACID database where writes still scale UP, but reads can scale OUT.
Consequences of this separation of read and write means that datomic reads scale practically arbitrarily for both query load and for dataset size. Writes do not.
This is a lot like Git, where you have to push to a central place which orders and rejects commits, but you can make useful reads from your local machine without touching network. Datomic is a lot like Git + realtime secret sauce.
That's only half the value though - Datomic also doesn't have an object relational impedance mismatch. This means Datomic doesn't need ORMs; Datomic's programming model is simpler than SQL for a competitive set of features. So you code faster with less bugs.
In short, you can ask a datomic database stuff like "Show me all things that are different for customer X from the database today versus the database one year ago on September 14th at 9:32 AM" and it can answer those types of queries with high performance.
You can also go forward in time, to a hypothetical future. (That is, you add data and get back a new DB value, which you can query against. But the DB's source isn't modified.) Can be useful in analytics which deal with what-if scenarios.
Thanks, that helped. They should be more clear on their website about that, I know Clojure a bit and some of the things about state, time and identity, and I still didn't get it.
It does not at all assume or require that your dataset will fit in RAM. To an extent, it will cache some indexes in RAM of query peers, but there is no expectation that the whole dataset is in RAM.
I'm doing some preliminary work on one. But realistically, it's a lofty goal and it'll be hard to get going. My priorities are different and so I'm taking some different design paths than datomic as well (ie, no datalog).
I have done some searching and have not turned up anything. The functionality that datomic enables is really intriguing, but I have trouble bringing myself around to using a non-open source bit of tooling.
I miss a piece of info here, could someone please fill in? Rich Hickey is known for Clojure, Metadata Partners for Datomic. What are the Relevance guys known for?
(Honest question, not a cheap attempt at dismissal :-) )
In addition, Relevance has had a close relationship with Rich for years, as such we've had a major hand in the development of Clojure, ClojureScript, Datomic, core.async, Pedestal, Simulant, and many other Clojure projects.
I attended a tech conference earlier this year where Rich Hickey was speaking. He was trumpeting the fact that data never really gets deleted in Datomic, and someone brought up the question: What happens if you are legally required to delete something from your database? I seem to remember him saying that Datomic wasn't really designed for that scenario, which sounds like a major problem.