Hacker News new | past | comments | ask | show | jobs | submit login

TL;DR: GPUs all over the Cloudflare global network; working closely with Microsoft, Meta, Hugging Face, Databricks, NVIDIA; new Cloudflare-native vector database; inference embedded in Cloudflare Workers; native support for WebGPU. Live demo: https://ai.cloudflare.com/



Do you actually run the inference in the worker? Or is it like what Fermyon does where they basically host the models for you and you get a SDK that is automatically connected to the function?


Unlike the first version of Constellation, Workers AI runs inference directly on GPUs that we are (quickly) installing in our global network.


But the code isn't running on the worker? It runs somewhere else on a GPU cluster?


It's a little like how Cloudflare Workers runs. You don't know which CPU it runs on, all you know is it's a CPU close to your end user. Same goes for this. We are rolling out GPUs everywhere across the globe and so Workers AI will just use a nearby GPU. Probably in the same machine as your workers, or maybe the same data center, or whatever other smart routing decision we make. What we are not doing is running a massive GPU cluster somewhere. This is all distributed and that's the power of owning your own network.


Since they don’t seem to be able to give a simple answer: the inference does not run in the worker. It connects to external GPUs.


I think the confusion is what is meant by "in the Worker." From a hardware perspective, the GPU may be in the same machine as the CPU that's powering the Worker. Or they may be across different machines in our network. We are not routing requests to some third party. And we will try to run the inference task as close as possible to who/whatever requested it. The whole idea of "serverless" is you shouldn't have to worry about what machine where runs whatever unless you're on the team building the scheduling and routing logic at Cloudflare.


I think his question is more about does the worker directly access the GPU and thus require js tooling to handle the GPU somehow (no), or does it make subrequests to a separate GPU service not running the worker runtime (yes).


Hey John, great work on this! Just a headsup, small typo on that page under R2: "Build mutli-cloud training architectures with free egress."


Thanks. Getting it fixed.


Any chance you're looking for technical product folks to work on this? I actually worked on a very similar deployment internally at Livepeer (focus was on live video enhancements but also generalized edge compute)!


we always are! email is rita at cloudflare dot com :)


thanks!


I see plans for more models via HF partnership, but can I or will I be able to run a custom fine-tuned version of a supported model?


On top of our hosted and supported catalog of models, and the deploy to CF partnerships like the HF one, you will also be able to bring your own custom model at some point in time.


Awesome. What about compiled model support? Running most of the listed models without compilation only makes sense for hobby projects.


Is CodeLlama somewhere on the roadmap?




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: