Hacker News new | past | comments | ask | show | jobs | submit login
Cognitect: Relevance merges with Metadata Partners (Datomic) (cognitect.com)
107 points by AndreasFrom on Sept 16, 2013 | hide | past | favorite | 26 comments



More context in their podcast: (http://cognitect.com/podcast)


Any plans to opensource Datomic?


What is the big deal about Datomic?

From their FAQ:

"Datomic is not a good fit if you need unlimited write scalability, or have data with a high update churn rate (e.g. counters)."

Don't you get most of that through... caching? Also, it seems to assume that the dataset will fit into RAM.


Datomic is interesting because it's a different take on what a database should look like. The TLDR version by someone who's looked into it a bit but not actually used it:

* Storage, Transactions, and Querying are separated as in different processes/machines separated.

* Data is immutable. Storage is pluggable and has implementations on top of Dynamo/Riak.

* Transaction semantics and ordering are controlled by a single process for consistency. This is the write scaling caveat. It's less of a restriction than it sounds (if you're thinking SQLite2 like I did) because there aren't writes/queries competing for resources, it's just the sequencing.

* Queries on the db are performed in-client and can interoperate with client code and state. When you write a query, datomic pulls the data from storage to the local machine and performs the query.

* Queries are in a logic programming language called datalog. Even if you aren't interested in the rest, I'll recommend spending an hour working through http://learndatalogtoday.org/ just for the exposure to logic programming.


You mean the whole data is fetched to the client and only queried afterwards? Why did they choose this way?


Only the range of data the client is interested in is fetched from the storage layer.

As for why they chose this, you'd have to ask them to be sure.

But two reasonable assumptions are: 1. they wanted the storage layer to be "dumb", in particular so that they could use existing services like Dynamo. 2. they wanted reading processes to be totally independent. Readers can talk directly to the dumb storage layer without any centralized resource coordinator to execute queries. That means horizontal scalability in the strict sense.


Only partial indexes are retrieved (what is needed to answer your exact query). The bonus is that that data is now local. Transversing deep structures then often approaches the speed of hash-map lookups. As someone who has worked on very complex SQL databases, this is a major win.


Only the data you need is fetched (and cached) so the client only has a subset of the database.


indexes and chunks of data that are used often remain cached in each application instance, and new changes are streamed to the application cache.

It means that reads from a hot cache do not touch network. Reads are very fast and scale "out". You can write code that does a lot of reads without caring much about performance. (SQL reads only scale "up" and you care very much about their performance.)

Datomic is like Git (distributed reads, central writes); Postgres is like CVS/SVN (centralized reads and writes). This is made possible by immutable history.


TLDR: Datomic is like Git.

In traditional ACID databases (SQL), all queries (read and write) mostly only scale UP (beefier db machine), not OUT (lots of db machines is very hard). Datomic is an ACID database where writes still scale UP, but reads can scale OUT.

Consequences of this separation of read and write means that datomic reads scale practically arbitrarily for both query load and for dataset size. Writes do not.

This is a lot like Git, where you have to push to a central place which orders and rejects commits, but you can make useful reads from your local machine without touching network. Datomic is a lot like Git + realtime secret sauce.

That's only half the value though - Datomic also doesn't have an object relational impedance mismatch. This means Datomic doesn't need ORMs; Datomic's programming model is simpler than SQL for a competitive set of features. So you code faster with less bugs.


In short, you can ask a datomic database stuff like "Show me all things that are different for customer X from the database today versus the database one year ago on September 14th at 9:32 AM" and it can answer those types of queries with high performance.

And no, the dataset does not need to fit in RAM.


You can also go forward in time, to a hypothetical future. (That is, you add data and get back a new DB value, which you can query against. But the DB's source isn't modified.) Can be useful in analytics which deal with what-if scenarios.


Thanks, that helped. They should be more clear on their website about that, I know Clojure a bit and some of the things about state, time and identity, and I still didn't get it.


It does not at all assume or require that your dataset will fit in RAM. To an extent, it will cache some indexes in RAM of query peers, but there is no expectation that the whole dataset is in RAM.



Follow up: What are some open-source alternatives or similar software?


I'm doing some preliminary work on one. But realistically, it's a lofty goal and it'll be hard to get going. My priorities are different and so I'm taking some different design paths than datomic as well (ie, no datalog).


I have done some searching and have not turned up anything. The functionality that datomic enables is really intriguing, but I have trouble bringing myself around to using a non-open source bit of tooling.


I'd love to be proved wrong, but I don't think there are any (at least that aren't just research prototypes)


((defn cognitect ([] (conj [relevance] datomic)))) ; => all ur cljr r belong to us


I miss a piece of info here, could someone please fill in? Rich Hickey is known for Clojure, Metadata Partners for Datomic. What are the Relevance guys known for?

(Honest question, not a cheap attempt at dismissal :-) )


Quite a few members of the Clojure core team and community work there, as evidenced by the intersection of http://thinkrelevance.com/team and http://clojure.com/about.html.


In addition, Relevance has had a close relationship with Rich for years, as such we've had a major hand in the development of Clojure, ClojureScript, Datomic, core.async, Pedestal, Simulant, and many other Clojure projects.


Hosting a high density of clojure/core, and all that implies.


I attended a tech conference earlier this year where Rich Hickey was speaking. He was trumpeting the fact that data never really gets deleted in Datomic, and someone brought up the question: What happens if you are legally required to delete something from your database? I seem to remember him saying that Datomic wasn't really designed for that scenario, which sounds like a major problem.


That actually isn't the case, you can delete data in datomic. http://docs.datomic.com/excision.html




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: