It's very serious. The BEAM had a problem in that it lacked solid number-crunching libraries, so some folks solved that, and when they did distributed ML-capabilities kind of just fell out as a neat bonus so some folks did that too.
So now it's integrated into the basic Livebook, just boot it and go to that example and you have a transcriber or whatever as boilerplate to play around with. Want something else from Huggingface? Just switch out a couple of strings referencing that model and all the rest is sorted for you, except if the tokenizer or whatever doesn't follow a particular format but in that case you just upload it to some free web service and make a PR with the result and reference that version hash specifically and it'll work.
Python is awful for sharding, concurrency, distribution, that kind of thing. With the BEAM you can trivially cluster with an instance on a dedicated GPU-outfitted machine that runs the LLM models or what have you and there you have named processes that control process pools for running queries and they'll be immediately available to any BEAM that clusters with it. Fine, you'll do some VPN or something that requires a bit of experience with networking, but compared to building a robust, distributed system in Python it's easy mode.
I don't know what the goals are, but I perceive the Nx/Bumblebee/BEAM platform as obviously better than Python for building production systems. There might be advantages to Python when creating and training models, I'm not sure, but if you already have the models and need to serve more than one, and want the latency to be low so the characteristically slow response feels a little faster, and don't already have a big Kubernetes system for running many Python applications in a distributed manner, then this is for you and it'll be better than good enough until you've created a rather large success.
> except if the tokenizer or whatever doesn't follow a particular format but in that case you just upload it to some free web service and make a PR with the result and reference that version hash specifically and it'll work.
So now it's integrated into the basic Livebook, just boot it and go to that example and you have a transcriber or whatever as boilerplate to play around with. Want something else from Huggingface? Just switch out a couple of strings referencing that model and all the rest is sorted for you, except if the tokenizer or whatever doesn't follow a particular format but in that case you just upload it to some free web service and make a PR with the result and reference that version hash specifically and it'll work.
Python is awful for sharding, concurrency, distribution, that kind of thing. With the BEAM you can trivially cluster with an instance on a dedicated GPU-outfitted machine that runs the LLM models or what have you and there you have named processes that control process pools for running queries and they'll be immediately available to any BEAM that clusters with it. Fine, you'll do some VPN or something that requires a bit of experience with networking, but compared to building a robust, distributed system in Python it's easy mode.
I don't know what the goals are, but I perceive the Nx/Bumblebee/BEAM platform as obviously better than Python for building production systems. There might be advantages to Python when creating and training models, I'm not sure, but if you already have the models and need to serve more than one, and want the latency to be low so the characteristically slow response feels a little faster, and don't already have a big Kubernetes system for running many Python applications in a distributed manner, then this is for you and it'll be better than good enough until you've created a rather large success.