I wanted to pick it up, I feel it's under-appreciated technology that has lots of potential. Reasons why I didn't:
- It's somewhat hard to sell to management. There (was) no company behind it to provide support; and it's not a "successful Apache project"/ with large-ish community, either. And generally for a long while it was a passion project more than something Frank McSherry would actively encourage you to use in production.
- As other have said, the "hello world" is somewhat tricky. Not a lot of people know Rust. If you say "let's do this project in Rust", this will likely not go well; if I were able to use it from .NET and JVM, as a library, it might be an easier sell (I'm personally more invested in .NET now but earlier in my carer it would've been JVM)
- last but not least: the "productization" story is a bit tricky; comparing it to Spark does it no service. For Spark, not only do I have managed clusters like EMR, but I have a decent amount of tooling to see how the cluster executes (the spark web UI). Also I can deploy it in mesos, yarn not just standalone (and mesos/yarn have their own tooling). For differential dataflow, one had none of that (at least last time I checked). Maybe it'd be more fair to compare it to Kafka Streams?
* Might I add: spark-ec2 was a huge help for me picking up spark, since before the 1.0 version. You can do tons of work on a single machine, yes... but, for this kind of systems, the very first question is "how do you distribute that?". And you have the story that "it's possible", but you don't have easy examples of "word count, done on 3 machines, not because it's necessary but because we demonstrate how easy it is to distribute the computing across machines".
* Compared to Kafka Streams: the thing about Kafka Streams is that you know what to use it for (Kafka!) and one immediately groks how one uses this in production (all state management is delegated to Kafka, this is truly just a library that helps you work better with Kafka). With differential dataflow, it's much less clear. You could use it with Kafka, but also with Twitter directly, or with something else. And what happens if it crashes? How do you recover from that? What are the data loss risks? Does it give you any guarantees or do you have to manage that?
- It's somewhat hard to sell to management. There (was) no company behind it to provide support; and it's not a "successful Apache project"/ with large-ish community, either. And generally for a long while it was a passion project more than something Frank McSherry would actively encourage you to use in production.
- As other have said, the "hello world" is somewhat tricky. Not a lot of people know Rust. If you say "let's do this project in Rust", this will likely not go well; if I were able to use it from .NET and JVM, as a library, it might be an easier sell (I'm personally more invested in .NET now but earlier in my carer it would've been JVM)
- last but not least: the "productization" story is a bit tricky; comparing it to Spark does it no service. For Spark, not only do I have managed clusters like EMR, but I have a decent amount of tooling to see how the cluster executes (the spark web UI). Also I can deploy it in mesos, yarn not just standalone (and mesos/yarn have their own tooling). For differential dataflow, one had none of that (at least last time I checked). Maybe it'd be more fair to compare it to Kafka Streams?