I think the Beam website should be a lot clearer about how things work[0], but I think Beam is offering to bill you for your actual usage, in a serverless fashion. So, unless you're continuously running computations for the entire month, it won't cost $1200/mo.
If it works the way I think it does, it sounds appealing, but the GPUs also feel a bit small. The A10G only has 24GB of VRAM. They say they're planning to add an A100 option, but... only the 40GB model? Nvidia has offered an 80GB A100 for several years now, which seems like it would be far more useful for pushing the limits of today's 70B+ parameter models. Quantization can get a 70B parameter model running on less VRAM, but it's definitely a trade-off, and I'm not sure how the training side of things works with regards to quantized models.
Beam's focus on Python apps makes a lot of sense, but what if I want to run `llama.cpp`?
Anyways, Beam is obviously a very small team, so they can't solve every problem for every person.
[0]: what is the "time to idle" for serverless functions? is it instant? "Pay for what you use, down to the second" sounds good in theory, but AWS also uses per-second billing on tons of stuff... EC2 instances don't just stop billing you when they go idle, though, you have to manually shut them down and start them up. So, making the lifecycle clearer would be great. Even a quick example of how you would be billed might be helpful.