I've been trying to put my finger on why Kafka so well captured the imagination of many distributed systems engineers. My best answer is, "low-cost publish and multi-consumer data-sharded subscribe is the key to resilient horizontal scaling and parallelism."
Kafka has its flaws, but it really served us well. We have Python Data Engineers who focus on distributed system design[1], and Kafka is one of the team's least finicky open source components, but it is used everywhere, and it basically enables the entire rest of the real-time data processing stack.
Kafka has its flaws, but it really served us well. We have Python Data Engineers who focus on distributed system design[1], and Kafka is one of the team's least finicky open source components, but it is used everywhere, and it basically enables the entire rest of the real-time data processing stack.
[1]: https://www.parse.ly/careers/python_data_engineer