Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Bytewax – Python Stateful Stream Processing on Timely Dataflow (github.com/bytewax)
21 points by amath on July 14, 2022 | hide | past | favorite | 13 comments


Been following this project for a while. There's a reason why it's making waves in developer circles - You get a simple to use Python developer experience with a powerful distributed data processing framework that scales to enormous workloads. Plus its battle tested since it relies on Timely Dataflow under the hood.


This looks cool bc you get the best of both worlds with a good developer experience in Python, but the performance of Rust. Are there any examples of scaling it on Kubernetes, that'd be an interesting evaluation.

Are you open to contributors btw? I'm interested in learning more about Python <> Rust bindings.


Yes, open to contributors!

I would say checkout PyO3 to learn more about how you can marry Python with Rust code. It is an awesome project.


Bytewax is the first streaming data python client that I can finally get my head around. I've used Bytewax and I love the developer experience, you folks really dialed it in there.

Also the docs are great, which many developer tools tend to neglect. Good docs are the first thing I notice when looking at a project.

Great work, and congrats on the launch.


Thank you!


This is much more straightforward than your JVM-based, huge infra solutions like Flink. For when you need something that is familiar [python] and just works.


I think this is a good architecture to focus on the developer experience. A rust runtime with a Python front end. Very cool to see


Pretty cool! Have you added anything to Timely Dataflow yet or is this just Python bindings?


Zander here, part of the team working on Bytewax. Much of the magic of Timely is passed through to the Python library, but there are a few features that we thought would make it easier to build with and run in production.

- We deepened the integration between inputs and output sources starting with Kafka/Redpanda. - We have added state recovery backed by either SQLite or by a Kafka API compatible platform like Redpanda or Kafka. - Built a deployment tool called Waxctl that makes it easier to run dataflows across pods on a kuberentes cluster and deploy to remote compute cloud VMs like EC2. - Provide some helpers for things like windowing and ordering inputs. These are still evolving

We are always looking for feedback and for which features to focus on next!


This is amazing, would love to chat and connect to learn more on this.


Thank you! Would love to


Does this work on a single node only, or does it work across a cluster like Kubernetes?


That’s one thing we are really excited about! Kubernetes makes it easy to scale Bytewax dataflows across multiple nodes relatively easily. We have some long format docs about how it works https://docs.bytewax.io/deployment/k8s-ecosystem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: