Cognitect: Relevance merges with Metadata Partners (Datomic)

calibraxis · on Sept 16, 2013

More context in their podcast: (http://cognitect.com/podcast)

mike_ivanov · on Sept 16, 2013

Any plans to opensource Datomic?

MrBuddyCasino · on Sept 16, 2013

What is the big deal about Datomic?

From their FAQ:

"Datomic is not a good fit if you need unlimited write scalability, or have data with a high update churn rate (e.g. counters)."

Don't you get most of that through... caching? Also, it seems to assume that the dataset will fit into RAM.

grayrest · on Sept 16, 2013

Datomic is interesting because it's a different take on what a database should look like. The TLDR version by someone who's looked into it a bit but not actually used it:

* Storage, Transactions, and Querying are separated as in different processes/machines separated.

* Data is immutable. Storage is pluggable and has implementations on top of Dynamo/Riak.

* Transaction semantics and ordering are controlled by a single process for consistency. This is the write scaling caveat. It's less of a restriction than it sounds (if you're thinking SQLite2 like I did) because there aren't writes/queries competing for resources, it's just the sequencing.

* Queries on the db are performed in-client and can interoperate with client code and state. When you write a query, datomic pulls the data from storage to the local machine and performs the query.

* Queries are in a logic programming language called datalog. Even if you aren't interested in the rest, I'll recommend spending an hour working through http://learndatalogtoday.org/ just for the exposure to logic programming.

zerr · on Sept 16, 2013

You mean the whole data is fetched to the client and only queried afterwards? Why did they choose this way?

jasonwatkinspdx · on Sept 16, 2013

Only the range of data the client is interested in is fetched from the storage layer.

As for why they chose this, you'd have to ask them to be sure.

But two reasonable assumptions are: 1. they wanted the storage layer to be "dumb", in particular so that they could use existing services like Dynamo. 2. they wanted reading processes to be totally independent. Readers can talk directly to the dumb storage layer without any centralized resource coordinator to execute queries. That means horizontal scalability in the strict sense.

_halgari · on Sept 16, 2013

Only partial indexes are retrieved (what is needed to answer your exact query). The bonus is that that data is now local. Transversing deep structures then often approaches the speed of hash-map lookups. As someone who has worked on very complex SQL databases, this is a major win.

icefox · on Sept 16, 2013

Only the data you need is fetched (and cached) so the client only has a subset of the database.

dustingetz · on Sept 16, 2013

indexes and chunks of data that are used often remain cached in each application instance, and new changes are streamed to the application cache.

It means that reads from a hot cache do not touch network. Reads are very fast and scale "out". You can write code that does a lot of reads without caring much about performance. (SQL reads only scale "up" and you care very much about their performance.)

Datomic is like Git (distributed reads, central writes); Postgres is like CVS/SVN (centralized reads and writes). This is made possible by immutable history.

dustingetz · on Sept 16, 2013

TLDR: Datomic is like Git.

In traditional ACID databases (SQL), all queries (read and write) mostly only scale UP (beefier db machine), not OUT (lots of db machines is very hard). Datomic is an ACID database where writes still scale UP, but reads can scale OUT.

Consequences of this separation of read and write means that datomic reads scale practically arbitrarily for both query load and for dataset size. Writes do not.

This is a lot like Git, where you have to push to a central place which orders and rejects commits, but you can make useful reads from your local machine without touching network. Datomic is a lot like Git + realtime secret sauce.

That's only half the value though - Datomic also doesn't have an object relational impedance mismatch. This means Datomic doesn't need ORMs; Datomic's programming model is simpler than SQL for a competitive set of features. So you code faster with less bugs.

drcode · on Sept 16, 2013

In short, you can ask a datomic database stuff like "Show me all things that are different for customer X from the database today versus the database one year ago on September 14th at 9:32 AM" and it can answer those types of queries with high performance.

And no, the dataset does not need to fit in RAM.

calibraxis · on Sept 16, 2013

You can also go forward in time, to a hypothetical future. (That is, you add data and get back a new DB value, which you can query against. But the DB's source isn't modified.) Can be useful in analytics which deal with what-if scenarios.

MrBuddyCasino · on Sept 16, 2013

Thanks, that helped. They should be more clear on their website about that, I know Clojure a bit and some of the things about state, time and identity, and I still didn't get it.

bjeanes · on Sept 16, 2013

It does not at all assume or require that your dataset will fit in RAM. To an extent, it will cache some indexes in RAM of query peers, but there is no expectation that the whole dataset is in RAM.

calibraxis · on Sept 16, 2013

Nope: https://twitter.com/cognitect/status/379605967211888640

1qaz2wsx3edc · on Sept 16, 2013

Follow up: What are some open-source alternatives or similar software?

jasonwatkinspdx · on Sept 16, 2013

I'm doing some preliminary work on one. But realistically, it's a lofty goal and it'll be hard to get going. My priorities are different and so I'm taking some different design paths than datomic as well (ie, no datalog).

robertfw · on Sept 16, 2013

I have done some searching and have not turned up anything. The functionality that datomic enables is really intriguing, but I have trouble bringing myself around to using a non-open source bit of tooling.

chrismonsanto · on Sept 16, 2013

I'd love to be proved wrong, but I don't think there are any (at least that aren't just research prototypes)

bfe · on Sept 16, 2013

((defn cognitect ([] (conj [relevance] datomic)))) ; => all ur cljr r belong to us

praptak · on Sept 16, 2013

I miss a piece of info here, could someone please fill in? Rich Hickey is known for Clojure, Metadata Partners for Datomic. What are the Relevance guys known for?

(Honest question, not a cheap attempt at dismissal :-) )

rsanders · on Sept 16, 2013

Quite a few members of the Clojure core team and community work there, as evidenced by the intersection of http://thinkrelevance.com/team and http://clojure.com/about.html.

_halgari · on Sept 16, 2013

In addition, Relevance has had a close relationship with Rich for years, as such we've had a major hand in the development of Clojure, ClojureScript, Datomic, core.async, Pedestal, Simulant, and many other Clojure projects.

bfe · on Sept 16, 2013

Hosting a high density of clojure/core, and all that implies.

jawns · on Sept 16, 2013

I attended a tech conference earlier this year where Rich Hickey was speaking. He was trumpeting the fact that data never really gets deleted in Datomic, and someone brought up the question: What happens if you are legally required to delete something from your database? I seem to remember him saying that Datomic wasn't really designed for that scenario, which sounds like a major problem.

andrewvc · on Sept 16, 2013

That actually isn't the case, you can delete data in datomic. http://docs.datomic.com/excision.html