Thanks for the pointer! As far as we know the WebGPU development on firefox is a bit lagging behind, so we use Chrome and did not develop this project on firefox.
Well, based on the reply below, it sounds like they weren't testing in Firefox at all, so my guess is they were simply using APIs that only exist in Chrome right now. Whether those are actually necessary for the SD implementation, no idea.
Ah. Thanks for following up. I see it's mentioning there was a regression bug in v111 in that particular call, so I guess could use an older version. I'm going to subscribe to the bug too.
It took 10 years for WebGL to be widely available, and it is still barely used beyond some niche use cases like 3D models in ecommerce, or Flash's revenge when coupled with WASM.
There is an origin trial. They should enable that, then it would work in Chrome stable today. It's only supported on Windows and Mac right now though, I think.
Tested on Intel MacOS 12.5 PC with AMD 8GB RX 580 GPU, about 28 secs for 20 steps, surprisingly fast too.
I did have to go to chrome://flags and enable "Unsafe WebGPU" even on Chrome Canary (113.0.5656.0) before it would work, otherwise I just got "no adapter" errors.
This is a tangent, but I’ve been wondering… and perhaps HNera know?
I use some basic libre cad programs to plan my dream house project. And their renderings are pretty non-photorealistic.
Are there any upscalers that can take an inside or outside house render and make it look like something from Pinterest? Meaning the input is an image, not text?
Yes! Something like ControlNet is great for this. I use this[1] API on Replicate specifically because it offers the various methods (depth maps, edge detection, etc.)
Replicate sometimes give you free use (I forget if this model does), but if you pay then an image output will cost you about one cent.
Give it your image, write a prompt (something like "a modern living room"), choose your ControlNet model from the drop-down, and submit.
If you choose depth map, for example, it will generate its best guess depth map for your image and use that to steer the Stable Diffusion output. It's fascinating, and a lot of fun to play with.
The existing tooling around Stable Diffusion very often features either a style transfer tool or even have a an input image to be given together with the text prompt.
You can also try importing your CAD model into Blender, assigning some materials to the different objects and getting renders there, though tuning the render may take more time than tuning a prompt.
Yes of course. Optimizing and building the model to the format acceptable by ONNX web runtime will getting this in. On the other hand, we also need to enhance our own runtime (for example for better memory pool management) in the future.
Confirmed working on Windows with AMD RX 590 in Chrome Canary. About 23 seconds on average using DPM (20 steps), 55 second average using PNDM (50 steps).
Thanks for your interest! Most of the existing stable diffusion demos rely on a server behind to run the image generation. It means you need to host your own GPU server to support these workloads. It is hard to have the demo run purely on web browser, because stable diffusion usually has heavy computation and memory consumption.
The web stable diffusion directly puts stable diffusion model in your browser, and it runs directly through client GPU on users’ laptop. This means there is no queueing time for the server’s response. It also means more opportunities for client server co-optimizations, since essentially the “client” and “server” are both the single laptop. The web stable diffusion is also friendly for personalization and privacy. Since everything runs only on the client side and no interaction with server is needed, you can imagine to have your own style stable diffusion deployed and demonstrated on the web without sharing the model to anyone else, and you can also run with personalized model input (e.g., the text prompt in this case) without letting others know.
Thanks for your interest again! And we are happy to hear your feedback on your experiences and the functionalities you would like us to add in the future.
What happens when someone wants to update/upgrade the model to a newer version? Can they just get a “diff” and “patch” their model, or do they have to download a whole new one?
Upgrading the model is pretty easy. We just need to build the new model locally in the same way we build the current model. This usually takes fewer than 2min. If people want to deploy the new version to web browser and share for others to use, they just need to upload the model weights to some server (for example we are now using a public Hugging Face repo to store the weights), and provide a link pointing to the weights. This can be achieved also with little effort.
Usually Stable Diffusion (and most machine learning models) run on a server with the web front end (eg A1111's SD web ui) just providing a user interface. Even if you download it, you need to run the server on your computer. When you make a request to it you are using the CPU/GPU on the server.
The linked version runs a version of the stable diffusion model in the webbrowser, so it uses your own CPU (and in the case GPU) via the APIs provided by the web browser. This specific implementation uses an API called WebGPU which isn't yet widely supported.
(not the author) This is taking the model and running it entirely in the browser via web assembly and using webGPU, meaning the browser is executing compiled code directly, inside its own process. This is different to the 'web ui' implementations like Auto1111 which are just a website front-end for a python script that is running a server and running the model in a background process on your computer. It's a very different type of implementation.
They kindly provide instructions to run it (even on Apple M1).