Interestingly, Twitter's in-house system did that too, but they now seem to thin...

airfreak · on Nov 29, 2018

It all depends. For example with Apache Pulsar, tailing readers are served from an in-memory cache in the serving layer (the Pulsar brokers) and only catch-up readers end up having to be served from the storage layer (Apache BookKeeper). This is a little different from DistributedLog which always required going to BookKeeper for reads.

Apache BookKeeper can add additional latency to catch-up readers, on top of the extra hop, because the data of multiple topics are combined into each ledger. This means that we lose some performance from sequential reads. This is mitigated in BookKeeper by writing to disk in batches and sorting a ledger by topic so messages of the same topic are found together, but it still involves more jumping around on disk.

Also, BookKeeper allows the nice separation of disk IO. The read and write path are separate and can be served by different disks so you can scale your reads and writes separately to a certain extent.

For all those reasons, I would have loved to have seen Twitter look at Apache Pulsar and compare performance profiles with Apache Kafka.

anhldbk · on Nov 30, 2018

Streamlio published their OpenMessaging benchmarks between Apache Kafka & Apache Pulsar here: https://streaml.io/pdf/Gigaom-Benchmarking-Streaming-Platfor...