I have never used neither kafka nor differential dataflow, but I would like to offer an personal anecdote as an illustration of an importance of a greater system:
I have once needed to set up webapp written in Python. I did this by running the code in WSGI instance, and it via nginx. Setting up all the activation files, locked-down permissions, secure sockets was pretty finicky to get right, and took non-trivial amount of time.
It would have been much easier to use Python's built-in web server and expose to internet directly. It has fewer moving parts and generally more predictable.
I still went with more complex solution -- because I needed logging, security and large file offload. And I used built-in web server for development.
There requirements of "production" system are pretty different than "development" one. Sometimes people are willing to install bigger (and therefore more fragile) system when they need more features.
That's not the issue here. Kafka is far more fragile than it should be, partly because companies have always approached it as cluster-first and partly because it's enterprise-first software where high setup costs just aren't important. A lot of JVM software ends up like this - there's a big chunk of fiddly O(1) work in getting it all going, just because no-one ever bothers to make it all easy to get started with.
I say this as a huge fan of Kafka, but things like MySQL have better defaults and are easier to get running out of the box, and there's no reason Kafka's starting experience couldn't be the same if someone cared enough to put the time and effort in. And ultimately it's a shame, because it leads people to ignore something that's a much better model and platform in the long term.
> there's no reason Kafka's starting experience couldn't be the same if someone cared enough to put the time and effort in
There's always a perverse incentive when the software provider's model is to monetize through support, consulting, and offering the software as a managed service. If it's too easy to run, then why would one pay for any of these services?
AIUI it's pretty expensive compared to, say, RDS or their managed Redis service? Which makes perfect sense relative to how much of a pain running your own Kafka cluster is.
100% worth it IMO, but it's a lot of upfront cost and you only start to see the benefits when a given flow is Kafka end-to-end and you learn how to use it, so I absolutely get why people are skeptical.
I have once needed to set up webapp written in Python. I did this by running the code in WSGI instance, and it via nginx. Setting up all the activation files, locked-down permissions, secure sockets was pretty finicky to get right, and took non-trivial amount of time. It would have been much easier to use Python's built-in web server and expose to internet directly. It has fewer moving parts and generally more predictable.
I still went with more complex solution -- because I needed logging, security and large file offload. And I used built-in web server for development.
There requirements of "production" system are pretty different than "development" one. Sometimes people are willing to install bigger (and therefore more fragile) system when they need more features.