The Essence of Datalog (2018)

jgrodziski · on March 10, 2019

There is a very nice Datalog tutorial at: http://www.learndatalogtoday.org/ Lean a little bit toward Datomic, but very nice examples nevertheless.

tosh · on March 10, 2019

Loved going through this a few years ago, eye opening

muydeemer · on March 10, 2019

As I see people mentioning Datalog-related software, I'm gonna throw a pebble into the yard as well - Graql (https://github.com/graknlabs/grakn) is a declarative graph-based query language with rule-based reasoning heavily inspired by Datalog and Logic Programming.

Disclaimer: I am one of the maintainers.

muydeemer · on March 10, 2019

Just to add and clarify: it supports recursion (since a couple releases back) and pattern negation based on set difference (on current master branch, soon to be released).

yazr · on March 10, 2019

> recently become popular among the scalable program analysis crowd.

Can someone give some examples? I have read papers from the Athens University using datalog and pointer analysis. Has this become a major direction ?

rntz · on March 10, 2019

Yes, the DOOP folks under Iannis Smaragdakis have been doing this for a while. They use souffle, I believe: https://souffle-lang.github.io/.

This is Semmle's business model (https://semmle.com/). They have a custom Datalog with OO features they call QL.

I've heard LogicBlox used to do program analysis as well, but AIUI they turned more towards business analytics and then got bought out, and the Datalog engine team all jumped ship.

The Rust compiler has been exploring implementing some of their type system (the borrow checker, I think?) in a Datalog.

On the academic front, Tamás Szabó and Sebastian Erdweg have been working on an incremental Datalog engine for static analysis; https://github.com/szabta89/IncA. I know Tamas has worked for Itemis, but I don't know whether they're using this tech.

I'm working on a functional language, Datafun, inspired by Datalog, but it's too much of a toy to do any real analysis with at the moment. All in all, Datalog is still a niche language and static analysis a niche application of it, but it seems to be on the rise.

frankmcsherry · on March 11, 2019

Moar links!

(caveat: I am involved in many things)

1. You can implement some minor flavor of Doop in differential dataflow (https://github.com/TimelyDataflow/differential-dataflow/tree...). The fragment came from Yannis which he described as a minimal non-trivial analysis (vs other, even smaller fragments), but it still uses some 900 dataflow operators (updates in 10s of ms when inputs change, though). The DD impl isn't meant to be understood, I'm afraid.

2. Many ex-LB folks are now at relational.ai doing .. advanced weird tech like they did with LB. I'd watch them (I do).

3. Polonius (https://github.com/rust-lang/polonius) defines the Rust borrow checker as Datalog rules, and went from differential dataflow to datafrog (https://github.com/rust-lang/datafrog).

4. If you want to check out other incremental Datalog environments, in addition to IncA, there are

https://github.com/comnik/declarative-dataflow

https://github.com/ryzhyk/differential-datalog

https://github.com/TimelyDataflow/differential-dataflow

Good times for Datalog, imo.

trurl · on March 10, 2019

The LogicBlox platform is still in use and in active development at Infor. But yes, many of the principals have indeed left.

DOOP actually used LogicBlox before Souffle existed, but it is difficult to be competitive when Souffle doesn't need things like transactions, ACID, datasets larger than memory, incremental evaluation, etc.

cfallin · on March 10, 2019

Doop is a fantastic base system to extend for anyone wanting to do program analysis generally! I built my thesis work on top of it (recently defended but unfortunately not published yet) and Datalog is incredibly productive to prototype ideas.

(Performance is another matter -- one has to pay careful attention to clause ordering, and watch out for O(n^2) / O(n^3) / ... loop nests, but at least it's easier to try options than refactoring an ad-hoc analysis engine.)

kendallgclark · on March 10, 2019

Datalog is awesome as is Logic Programming generally. It's not only useful in static program analysis but also in areas as diverse as synthetic biology and data integration.

There's lots of datalog and similar technology in most Knowledge Graph platforms, too. For example, http://stardog.com/.

trurl · on March 10, 2019

I don't see anywhere that mentions datalog in relationship to Stardog. I see OWL2 and SparQL, which as far as I can tell do not support recursion other than in experimental prototypes. If there is no recursion, it is just conjunctive queries and not datalog.

mmarx · on March 10, 2019

> If there is no recursion, it is just conjunctive queries and not datalog.

SPARQL has negation (FILTER NOT EXISTS), though, making query answering PSpace-hard, whereas conjunctive query answering is NP-complete, so the two are very much not the same, although still less expressive then Datalog (which is ExpTime-complete). CQ answering for the Horn fragment of OWL2 is already 2ExpTime-complete, however, making it vastly more expressive than pure Datalog.

kendallgclark · on March 10, 2019

It supports SWRL, a LP rule language that has a well-studied relationship with Datalog. FWIW.

tosh · on March 10, 2019

Datalog is also used as query language for Datomic

https://www.datomic.com/

https://youtube.com/watch?v=Cym4TZwTCNU

tannhaeuser · on March 10, 2019

No it's not. The query language of Datomic, and that taught on learndatalogtoday.com, is a Datomic-proprietary language similar to Prolog-in-LISP implementations or frame-based expert systems of old.

souenzzo · on March 10, 2019

There is datascript, which has a foss implementation of datascript (with nice performance)

https://github.com/tonsky/datascript And this (deprecated) project, that translate datalog to SQL(lite) https://github.com/mozilla/mentat

muznar · on March 11, 2019

There is also Flix language inspired by Datalog that adds the features of monotone functions and lattice models. This post is such a coincidence for me since I have just started to work on Flix and souffle for my thesis.

BoiledCabbage · on March 10, 2019

So for someone who is familiar with it, why hasn't Datalog supplanted SQL for relational querying? What's its limitation?

mathetic · on March 10, 2019

I think it's more cultural than anything else. SQL, despite being declarative itself, has a more imperative feel to it for some people (recursion being at the fringes of its standard) and people seem to believe more declarative means slower and more difficult to comprehend. The former is true only due to the implementation effort that went into SQL engines and the latter is plain false.

It probably doesn't help that the standard presentation of Datalog looks very much like Prolog, which most people think is an esoteric and failed programming language.

As for its limitations, it's disciplined in a number of ways, but those are features rather than shortcomings. So you can see in the post that it employs syntactic restrictions to ensure domain independence, it doesn't allow function symbols to have termination. It requires predicates that are not purely logical (extralogical) to satisfy certain conditions for safe execution. These make sure that your queries are well-behaved even before you start running the query. There is research on relaxing these restrictions (cough my work cough) without compromising on the nice properties that come along with it.

The nice thing is of course that if you want to do the same for SQL you start with pandora's box open and try to contain the chaos inside again, while with Datalog you're carefully trying to make the best out of it without unleashing madness. The latter is a much nicer foundation.

bowyakka · on March 10, 2019

On the continum of relational algebra to logic programming catalog is somewhere in the middle.

It is better at handling recursive queries, and logical inference than SQL. On the flip side SQL can be better at more set style operations (for instance windowing functions are painful in datalog).

It is different tooling, essentially it comes down to do you want to extract answers from your data with varying relational algebra but holding the logic the same (SQL), or keeping the relational algebra the same but varying the logic (datalog).

aargh_aargh · on March 10, 2019

Could you please explain what you mean by "varying relational algebra/logic"? Didn't understand that at all. I only know SQL.

alecco · on March 10, 2019

Lack of standards and lack of a killer engine implementing it.

Also people don't like to switch and SQL does a good enough job for most cases and recently it added a lot of fancy features like windowing functions.

tannhaeuser · on March 10, 2019

Prolog, of which Datalog is a syntactical subset, is an ISO standard since 1995.

mathetic · on March 10, 2019

That doesn't help at all because although Datalog is a syntactic fragment, its dynamic semantics are very different, so the operations defined in Prolog standard are suggestive at best and completely nonsensical at worst.

For example, the program `p(X) :- p(X)` with query `?- p(X)` is an infinite loop in Prolog but in Datalog it terminates trivially and is equal to the empty set of facts for `p` in the absence any other facts.

zengid · on March 10, 2019

How does Datalog differ from something like MiniKanren?

tobmlt · on March 10, 2019

My take on miniKanren is that it's implementation is designed to be small and hackable. Especially small with microKanren. If you don't know scheme/lisp, but want to roll your own for educational purposes, then check this out: https://codon.com/hello-declarative-world

Vs Prolog, the emphasis in miniKanren is on constraint programming --- especially writing new constraints to extend it to more problems. Where (chief variants of) Prolog have been optimized in various ways for certain types of problems.

See Will Byrd's answer here: https://stackoverflow.com/questions/28467011/what-are-the-ma...

What I read about Datalog is that it is basically a recursive relational database. It's set up for "deductive queries". This wording reminds me of miniKanren --- but the Kanrens are Turing complete.

Ah, Wikipedia has this to say:

In contrast to Prolog, Datalog,

1.) disallows complex terms as arguments of predicates, e.g., p (1, 2) is admissible but not p (f (1), 2),

2.) imposes certain stratification restrictions on the use of negation and recursion,

3.) requires that every variable that appears in the head of a clause also appears in a nonarithmetic positive (i.e. not negated) literal in the body of the clause,

4.) requires that every variable appearing in a negative literal in the body of a clause also appears in some positive literal in the body of the clause[4]

I'd have to put some thought into checking this list against the capabilities of miniKanren. Generally it doesn't ring any bells.

rntz · on March 10, 2019

miniKanren is Turing-complete; Datalog deliberately isn't.

Despite that, there are things Datalog can do easily that miniKanren can't. miniKanren can't handle negation natively, but Datalog can (as long as it's stratified, that is, a predicate can't use its own negation).

Datalog is also forward-chaining, so some things that would infinite loop in miniKanren work fine in Datalog - for example, computing the transitive closure of a cyclic graph. If someone added tabling to miniKanren that would solve this.

There's also the question of performance. Some miniKanren implementations are surprisingly fast for what it does, but I suspect they can't compete with, say, Souffle, for the kind of static analysis workloads it's aimed at. Would love to be proven wrong, though.

cgrand-net · on March 10, 2019

A Datalog program is guaranteed to terminate (if finite databases). A prolog or miniKanren one doesn’t. Datalog is not Turing-complete.

mathetic · on March 10, 2019

OP here. Happy to answer any questions.