Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Essence of Datalog (2018) (dodisturb.me)
154 points by bibyte on March 10, 2019 | hide | past | favorite | 29 comments


There is a very nice Datalog tutorial at: http://www.learndatalogtoday.org/ Lean a little bit toward Datomic, but very nice examples nevertheless.


Loved going through this a few years ago, eye opening


As I see people mentioning Datalog-related software, I'm gonna throw a pebble into the yard as well - Graql (https://github.com/graknlabs/grakn) is a declarative graph-based query language with rule-based reasoning heavily inspired by Datalog and Logic Programming.

Disclaimer: I am one of the maintainers.


Just to add and clarify: it supports recursion (since a couple releases back) and pattern negation based on set difference (on current master branch, soon to be released).


> recently become popular among the scalable program analysis crowd.

Can someone give some examples? I have read papers from the Athens University using datalog and pointer analysis. Has this become a major direction ?


Yes, the DOOP folks under Iannis Smaragdakis have been doing this for a while. They use souffle, I believe: https://souffle-lang.github.io/.

This is Semmle's business model (https://semmle.com/). They have a custom Datalog with OO features they call QL.

I've heard LogicBlox used to do program analysis as well, but AIUI they turned more towards business analytics and then got bought out, and the Datalog engine team all jumped ship.

The Rust compiler has been exploring implementing some of their type system (the borrow checker, I think?) in a Datalog.

On the academic front, Tamás Szabó and Sebastian Erdweg have been working on an incremental Datalog engine for static analysis; https://github.com/szabta89/IncA. I know Tamas has worked for Itemis, but I don't know whether they're using this tech.

I'm working on a functional language, Datafun, inspired by Datalog, but it's too much of a toy to do any real analysis with at the moment. All in all, Datalog is still a niche language and static analysis a niche application of it, but it seems to be on the rise.


Moar links!

(caveat: I am involved in many things)

1. You can implement some minor flavor of Doop in differential dataflow (https://github.com/TimelyDataflow/differential-dataflow/tree...). The fragment came from Yannis which he described as a minimal non-trivial analysis (vs other, even smaller fragments), but it still uses some 900 dataflow operators (updates in 10s of ms when inputs change, though). The DD impl isn't meant to be understood, I'm afraid.

2. Many ex-LB folks are now at relational.ai doing .. advanced weird tech like they did with LB. I'd watch them (I do).

3. Polonius (https://github.com/rust-lang/polonius) defines the Rust borrow checker as Datalog rules, and went from differential dataflow to datafrog (https://github.com/rust-lang/datafrog).

4. If you want to check out other incremental Datalog environments, in addition to IncA, there are

https://github.com/comnik/declarative-dataflow

https://github.com/ryzhyk/differential-datalog

https://github.com/TimelyDataflow/differential-dataflow

Good times for Datalog, imo.


The LogicBlox platform is still in use and in active development at Infor. But yes, many of the principals have indeed left.

DOOP actually used LogicBlox before Souffle existed, but it is difficult to be competitive when Souffle doesn't need things like transactions, ACID, datasets larger than memory, incremental evaluation, etc.


Doop is a fantastic base system to extend for anyone wanting to do program analysis generally! I built my thesis work on top of it (recently defended but unfortunately not published yet) and Datalog is incredibly productive to prototype ideas.

(Performance is another matter -- one has to pay careful attention to clause ordering, and watch out for O(n^2) / O(n^3) / ... loop nests, but at least it's easier to try options than refactoring an ad-hoc analysis engine.)


Datalog is awesome as is Logic Programming generally. It's not only useful in static program analysis but also in areas as diverse as synthetic biology and data integration.

There's lots of datalog and similar technology in most Knowledge Graph platforms, too. For example, http://stardog.com/.


I don't see anywhere that mentions datalog in relationship to Stardog. I see OWL2 and SparQL, which as far as I can tell do not support recursion other than in experimental prototypes. If there is no recursion, it is just conjunctive queries and not datalog.


> If there is no recursion, it is just conjunctive queries and not datalog.

SPARQL has negation (FILTER NOT EXISTS), though, making query answering PSpace-hard, whereas conjunctive query answering is NP-complete, so the two are very much not the same, although still less expressive then Datalog (which is ExpTime-complete). CQ answering for the Horn fragment of OWL2 is already 2ExpTime-complete, however, making it vastly more expressive than pure Datalog.


It supports SWRL, a LP rule language that has a well-studied relationship with Datalog. FWIW.


Datalog is also used as query language for Datomic

https://www.datomic.com/

https://youtube.com/watch?v=Cym4TZwTCNU


No it's not. The query language of Datomic, and that taught on learndatalogtoday.com, is a Datomic-proprietary language similar to Prolog-in-LISP implementations or frame-based expert systems of old.


There is datascript, which has a foss implementation of datascript (with nice performance)

https://github.com/tonsky/datascript And this (deprecated) project, that translate datalog to SQL(lite) https://github.com/mozilla/mentat


There is also Flix language inspired by Datalog that adds the features of monotone functions and lattice models. This post is such a coincidence for me since I have just started to work on Flix and souffle for my thesis.


So for someone who is familiar with it, why hasn't Datalog supplanted SQL for relational querying? What's its limitation?


I think it's more cultural than anything else. SQL, despite being declarative itself, has a more imperative feel to it for some people (recursion being at the fringes of its standard) and people seem to believe more declarative means slower and more difficult to comprehend. The former is true only due to the implementation effort that went into SQL engines and the latter is plain false.

It probably doesn't help that the standard presentation of Datalog looks very much like Prolog, which most people think is an esoteric and failed programming language.

As for its limitations, it's disciplined in a number of ways, but those are features rather than shortcomings. So you can see in the post that it employs syntactic restrictions to ensure domain independence, it doesn't allow function symbols to have termination. It requires predicates that are not purely logical (extralogical) to satisfy certain conditions for safe execution. These make sure that your queries are well-behaved even before you start running the query. There is research on relaxing these restrictions (cough my work cough) without compromising on the nice properties that come along with it.

The nice thing is of course that if you want to do the same for SQL you start with pandora's box open and try to contain the chaos inside again, while with Datalog you're carefully trying to make the best out of it without unleashing madness. The latter is a much nicer foundation.


On the continum of relational algebra to logic programming catalog is somewhere in the middle.

It is better at handling recursive queries, and logical inference than SQL. On the flip side SQL can be better at more set style operations (for instance windowing functions are painful in datalog).

It is different tooling, essentially it comes down to do you want to extract answers from your data with varying relational algebra but holding the logic the same (SQL), or keeping the relational algebra the same but varying the logic (datalog).


Could you please explain what you mean by "varying relational algebra/logic"? Didn't understand that at all. I only know SQL.


Lack of standards and lack of a killer engine implementing it.

Also people don't like to switch and SQL does a good enough job for most cases and recently it added a lot of fancy features like windowing functions.


Prolog, of which Datalog is a syntactical subset, is an ISO standard since 1995.


That doesn't help at all because although Datalog is a syntactic fragment, its dynamic semantics are very different, so the operations defined in Prolog standard are suggestive at best and completely nonsensical at worst.

For example, the program `p(X) :- p(X)` with query `?- p(X)` is an infinite loop in Prolog but in Datalog it terminates trivially and is equal to the empty set of facts for `p` in the absence any other facts.


How does Datalog differ from something like MiniKanren?


My take on miniKanren is that it's implementation is designed to be small and hackable. Especially small with microKanren. If you don't know scheme/lisp, but want to roll your own for educational purposes, then check this out: https://codon.com/hello-declarative-world

Vs Prolog, the emphasis in miniKanren is on constraint programming --- especially writing new constraints to extend it to more problems. Where (chief variants of) Prolog have been optimized in various ways for certain types of problems.

See Will Byrd's answer here: https://stackoverflow.com/questions/28467011/what-are-the-ma...

What I read about Datalog is that it is basically a recursive relational database. It's set up for "deductive queries". This wording reminds me of miniKanren --- but the Kanrens are Turing complete.

Ah, Wikipedia has this to say:

In contrast to Prolog, Datalog,

1.) disallows complex terms as arguments of predicates, e.g., p (1, 2) is admissible but not p (f (1), 2),

2.) imposes certain stratification restrictions on the use of negation and recursion,

3.) requires that every variable that appears in the head of a clause also appears in a nonarithmetic positive (i.e. not negated) literal in the body of the clause,

4.) requires that every variable appearing in a negative literal in the body of a clause also appears in some positive literal in the body of the clause[4]

I'd have to put some thought into checking this list against the capabilities of miniKanren. Generally it doesn't ring any bells.


miniKanren is Turing-complete; Datalog deliberately isn't.

Despite that, there are things Datalog can do easily that miniKanren can't. miniKanren can't handle negation natively, but Datalog can (as long as it's stratified, that is, a predicate can't use its own negation).

Datalog is also forward-chaining, so some things that would infinite loop in miniKanren work fine in Datalog - for example, computing the transitive closure of a cyclic graph. If someone added tabling to miniKanren that would solve this.

There's also the question of performance. Some miniKanren implementations are surprisingly fast for what it does, but I suspect they can't compete with, say, Souffle, for the kind of static analysis workloads it's aimed at. Would love to be proven wrong, though.


A Datalog program is guaranteed to terminate (if finite databases). A prolog or miniKanren one doesn’t. Datalog is not Turing-complete.


OP here. Happy to answer any questions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: