I’m interested in pulsar but also wary of it. On paper it sounds like an amazing system, the benefits of kafka without the downsides. But if it is that good, why isn’t it used more?
I really like Pulsar and it definitely has overlap for some use cases with Kafka but its important to understand that they are 2 different kinds of systems.
Pulsar is a classic Pub/Sub system. It offers persistence but the basic concepts are what you'd find in most any Pub/Sub system.
Kafka is a distributed log (data structure, not tracing) system. Its basic concepts are what you would expect around something that is persisting events. In many ways its better to think about Kafka as an event datastore and not a messaging system.
In particular, Kafka is much better in use cases where you want to provide all events to many different consumers on their own time frames, over and over again. Use cases like building distributed caches and replication formats will likely fit Kafka better than Pulsar.
Also, Kafka has much higher industry adoption. You are much more likely to have third party support for Kafka than for Pulsar. Also each has a seperate Apache project dependency. Zookeeper has... warts, but they are well known and operating Zookeeper clusters is a much more common skillset than operating BookKeeper.
Again, I really like Pulsar but claiming it is better in basically every way is just false.
I don't see the differences that you list. Both systems produce and subscribe to named topics, with multiple consumers which can read and re-read at their own pace, with data sharded by key to scale, and configurable retention.
Pub/sub is the interface to the underlying log storage in both, and you can layer an event sourcing, service bus, CEP, MQ, or any other messaging buzzword on top.
However, Pulsar goes further than Kafka. It supports millions of topics, multi-tenant namespacing, more consumer options (exclusive, shared/group), per-message acknowledgements instead of a single offset, non-persistent topics for broadcast or ephemeral messaging, geo-replication, tiering to cloud storage (useful for that event store), and a functions platform for lightweight processing. Pulsar also uses Zookeeper for coordination and has official Kubernetes deployments for easy operations.
You're right that Kafka has much wider industry usage, and perhaps that is a significant advantage for most, but the constant announcement of yet another suite of helper tools by every major company only seems to show that it's not quite polished. Other than community and integrations marketshare, I cannot think of any functional area where Kafka is better than Pulsar. Can you name any?
Actually, I hadn't used Pulsar since they added the reader interface or Kafka wrappers. It likely now does support the event store use cases it didn't previously. Very cool.
The additional requirement of BookKeeper then is the only major technical negative I see, leaving the market share argument as the major negative. I think saying that Kafka has a lot of helper tools being a negative is pretty unfair though. Pulsar has such lower adoption that there just isn't the rationale for creating the helper tools. That shouldn't be used as metric to decide which is more polished.
Most of the tools seem to do the same thing while they would be unneeded for Pulsar so I'm going by that. But yes, Kafka has a much bigger community which is probably what most users want.
Pulsar Functions appears to be a stream processing engine similar to Flink or Kafka Streams. In other words, it is more like a runtime for streaming apps.
The relation Pulsar ~ Pulsar Functions appears similar to Kafka ~ Kafka Streams. One is a message broker, the other a streaming engine.
Exactly-once (or effectively-once) in Pulsar Functions is depended on assigning unique sequence ids on the producer side. When consuming from Pulsar topic those can be taken from the consumed message, but AFAIK Pulsar doesn't assign those unique ids on its own, so it needs to be done externally.
That's true. KSQL is another thing that Kafka wins at, which makes a pretty smooth experience for straightforward stream processing and general management over the command line tools.
Pulsar is working on Presto integration for querying which should be interesting.
Pulsar is a classic Pub/Sub system. ...
Kafka is a distributed log
This is true but I’ll wager that 99% of real world deployments of Kafka are in the simple pubsub use case. Most people just aren’t operating at LinkedIn scale or complexity.