Does this trick preclude the ability to sort your data within a partition? You wouldn’t be able to rely on the row IDs being sequential anymore to be able to just refer to a prefix of them within a newly created file.
This is a well-known class of optimization and the literature term is “late materialization”. It is a large set of strategies including this one. Late materialization is about as old as column stores themselves.
This strategy will not work well for Apache Kafka because it is extremely IOPS hungry if you have more than a few partitions, and a replay of a large topic will require lots of IO bandwidth. It would work well e.g. a columnar database where a query targeting old data may only require reading a small fraction of the size of the volume, but Kafka is effectively a row-oriented storage system, so the IO pattern is different.
We're not talking about no disks as in no storage, just nothing other than object storage. This does have a latency trade-off, but with the advent of S3 Express One Zone and Azure's equivalent high-performance tier (with GCP surely not far behind), a system designed purely around object storage can now trade cost for latency where it makes sense. WarpStream already has support for writing to a quorum of S3 Express One Zone buckets to provide regional availability, so there's not an availability trade-off here either.
There are no silver bullets. Traditional S3, with the durability guarantees that S3 provides, has a latency trade-off because the data needs to be copied to additional availability zones before acknowledging the write. Once you collapse everything to a single availability zone (i.e. S3 Express One Zone), you have little reason not to use Kafka, which scales costs within a single AZ without a problem. At $0.16/GB, S3EOZ is about 7x more expensive than normal S3 ($0.023/GB) for fewer copies of the data/lower integrity guarantees, or about 60% more expensive than MSK or Kinesis Data Streams ($0.10/GB). If you write to a quorum of S3EOZ, then you're tripling your S3EOZ storage costs, to 0.16 * 3 = $0.48/GB. And this doesn't include the cost of compute!
Where's the value above just running Kafka within a single AZ, with no latency trade-off?
You don't have to keep the data stored in S3 express one zone forever, you can just land it there and then immediately compact it to S3 standard. You still pay the higher fee to write to S3EOZ, but not the higher storage fee.
WarpStream does this, data gets compacted out within seconds usually. Of course this is now... tiered storage. But implemented over two "infinitely scalable" remote storage systems so it gets rid of all the operational and scaling problems you have with a typical tiered storage Kafka setup that uses local volumes as the landing zone.
> so it gets rid of all the operational and scaling problems you have with a typical tiered storage Kafka setup
Do these operational and scaling problems include AWS's managed services? MSK, Kinesis Data Streams?
At small scale, why wouldn't someone go with one of those? And at large scale, where's the Total Cost of Ownership comparison to show that it's worth it to ditch Kafka's local disks for a model built on object storage?
RE: comparing to a single-zone Kafka cluster. A lot of people really dislike operating Kafka. Some people don't mind it and that's cool too, but its not the majority in my experience.
In addition to the high cost of S3Express, utilizing warpstream to write three replicas to S3Express and later compacting them to S3Standard could result in quadruple network/outbound traffic costs. With two consumer groups involved, this could increase to six times the network/outbound traffic.
Considering a c5.4xlarge instance with 16 cores and 32GB of memory, which offers a baseline bandwidth of only 5Gib, it's limited to a maximum production throughput of 100MiB/s.
Therefore, I have reservations about the cost-effectiveness of your low-latency solution, given these potential expenses.
I guess we’ll have to wait for a full write up of this, but it does seem like having multiple categories of object storage is pulls off hood tiered storage!
…rebranded with a different name, again.
Again complex, again no obvious way to query storage directly, again unclear performance characteristics, again no obvious reason to see why the networking costs make saving from it largely meaningless.
You have to admit it’s a bit of a hard sell without any comeback after literally just saying that people were just inventing new names for minor variations on tiered storage…
I agree with your viewpoint. The crux of the matter is not whether to use tiered storage or not, but what trade-offs have been made in the specific storage architecture and what benefits have been gained. Here(https://github.com/AutoMQ/automq?tab=readme-ov-file#-automq-...) is a qualitative comparison chart of streaming systems including kafka/confluent/redpanda/warpstream/automq. This comparison chart does not have specific numerical comparisons, but purely based on their trade-offs at the storage level, I think this will be of some use to you.
We're still drafting our next post in this series, but the answer is actually very simple: two tiers of object storage do not have the same drawbacks as a combination of object storage and local disk. We wanted to explain that in this post too, but it would've been unreasonably long.
We've designed WarpStream to work extremely well on the slower, harder-to-use one first, and that is how 95+% of our workloads run in production. The tiered storage solutions from other streaming vendors do the opposite, where they were first designed for local SSDs and then bolted on object storage later.
The equivalent would be if we were pitching our support for an even slower, cheaper tier of object storage like AWS S3 Glacier.
WarpStream co-founder here. Implementing the Idempotent Producer feature for the Apache Kafka protocol was definitely a fun challenge for us. Please let me know if you have any questions!
WarpStream doesn't implement compacted topics today. It is on our roadmap, though. Compacted topics are typically not used in high-throughput workloads, so our plan is to delay compactions for longer than a disk-based system would to trade space amplification for write amplification.
WarpStream flushes after 4MiB of data or a configurable amount of time. Flushes can also happen concurrently.
In general, we'd prefer to not introduce many knobs. We're running a realistic throughput testing workload in our staging and production environments 24/7, so we've configured most of the knobs already to reasonable defaults.
We're aiming for per-GB usage-based pricing that is significantly cheaper than the alternatives, but the BYOC model combined with our extremely efficient cloud control plane gives us a lot of flexibility here. That's why we can offer a free tier at all.
We're mostly just not sure yet, so your input would be appreciated.