Hacker News new | past | comments | ask | show | jobs | submit login
Web Stable Diffusion (github.com/mlc-ai)
254 points by crowwork on March 17, 2023 | hide | past | favorite | 41 comments



Note that (only?) Chrome canary supports WebGPU so this won't yet work in most people's browsers.

They kindly provide instructions to run it (even on Apple M1).


hm. It seems it's been in firefox, pref'd off by default, for a few years now.. https://hacks.mozilla.org/2020/04/experimental-webgpu-in-fir...


Thanks for the pointer! As far as we know the WebGPU development on firefox is a bit lagging behind, so we use Chrome and did not develop this project on firefox.


I downloaded Firefox nightly a couple of days ago, turned the flags on accordingly, but it didn't work out saying:

> Find an error initializing the WebGPU device TypeError: adapter.requestAdapterInfo is not a function

No idea how to fix it.


Well, based on the reply below, it sounds like they weren't testing in Firefox at all, so my guess is they were simply using APIs that only exist in Chrome right now. Whether those are actually necessary for the SD implementation, no idea.


Submitted a question in Firefox support forum, and there seems some bugs blocking it from running smoothly: https://support.mozilla.org/en-US/questions/1408328


Ah. Thanks for following up. I see it's mentioning there was a regression bug in v111 in that particular call, so I guess could use an older version. I'm going to subscribe to the bug too.


Webgpu will ship this year, so it will be more widely available pretty soon


It took 10 years for WebGL to be widely available, and it is still barely used beyond some niche use cases like 3D models in ecommerce, or Flash's revenge when coupled with WASM.


With WebGPU it will be faster - libraries like ThreeJS will use WebGPU when possible.


ThreeJS doesn't even has a good alternative to having a complete incompatible shading language, other than switching to graphical based shaders.

Do you expect everyone to rewrite 10 years of shaders just for fun?


Yes.


There is an origin trial. They should enable that, then it would work in Chrome stable today. It's only supported on Windows and Mac right now though, I think.


WebGPU comes out April 26th - then 65% of the worlds web browsers will have access to it.


Super interesting! Do you have a source on this? A quick search didn‘t turn up anything for me


Anyone interested in this might also be interested in WONNX: https://github.com/webonnx/wonnx


Ohh wow that actually worked, that's awesome: https://i.imgur.com/4tYEphX.png

Tested on Intel MacOS 12.5 PC with AMD 8GB RX 580 GPU, about 28 secs for 20 steps, surprisingly fast too. I did have to go to chrome://flags and enable "Unsafe WebGPU" even on Chrome Canary (113.0.5656.0) before it would work, otherwise I just got "no adapter" errors.


Yep it surprisingly works on my AMDGPU too, even if it’s designed only for M1/M2


This is a tangent, but I’ve been wondering… and perhaps HNera know?

I use some basic libre cad programs to plan my dream house project. And their renderings are pretty non-photorealistic.

Are there any upscalers that can take an inside or outside house render and make it look like something from Pinterest? Meaning the input is an image, not text?


Yes! Something like ControlNet is great for this. I use this[1] API on Replicate specifically because it offers the various methods (depth maps, edge detection, etc.)

Replicate sometimes give you free use (I forget if this model does), but if you pay then an image output will cost you about one cent.

Give it your image, write a prompt (something like "a modern living room"), choose your ControlNet model from the drop-down, and submit.

If you choose depth map, for example, it will generate its best guess depth map for your image and use that to steer the Stable Diffusion output. It's fascinating, and a lot of fun to play with.

[1] https://replicate.com/jagilley/controlnet

Edit: I'd love to see what you produce, if you do use it.


Do you think it's possible to use similar tools for visualizing a house from a floor plan only? Or maybe ChatGPT 4.0 is more up to the task these days


ControlNet maybe? https://github.com/lllyasviel/ControlNet

At huggingface: https://huggingface.co/spaces/hysts/ControlNet

Play around with the different models, you might get better results with some vs others.


The existing tooling around Stable Diffusion very often features either a style transfer tool or even have a an input image to be given together with the text prompt.

For example: https://nmkd.itch.io/t2i-gui


You can also try importing your CAD model into Blender, assigning some materials to the different objects and getting renders there, though tuning the render may take more time than tuning a prompt.


Is it possible to integrate this with [onnxruntime-web](https://onnxruntime.ai/docs/tutorials/web/)?


Yes of course. Optimizing and building the model to the format acceptable by ONNX web runtime will getting this in. On the other hand, we also need to enhance our own runtime (for example for better memory pool management) in the future.


Confirmed working on Windows with AMD RX 590 in Chrome Canary. About 23 seconds on average using DPM (20 steps), 55 second average using PNDM (50 steps).

I had issues compiling it on my own computer, but the demo version at https://mlc.ai/web-stable-diffusion/#text-to-image-generatio... works fine.


Same on Windows 11, 5800x3d with Vega64, PortableApps Chrome Canary.

17s DPM (20 steps), 41s PNDM (50 steps)


Can someone Eli5 how machine learning compilation works? Is site basically A1111's SD web ui with fewer bells and whistles but way less intensive?


Thanks for your interest! Most of the existing stable diffusion demos rely on a server behind to run the image generation. It means you need to host your own GPU server to support these workloads. It is hard to have the demo run purely on web browser, because stable diffusion usually has heavy computation and memory consumption.

The web stable diffusion directly puts stable diffusion model in your browser, and it runs directly through client GPU on users’ laptop. This means there is no queueing time for the server’s response. It also means more opportunities for client server co-optimizations, since essentially the “client” and “server” are both the single laptop. The web stable diffusion is also friendly for personalization and privacy. Since everything runs only on the client side and no interaction with server is needed, you can imagine to have your own style stable diffusion deployed and demonstrated on the web without sharing the model to anyone else, and you can also run with personalized model input (e.g., the text prompt in this case) without letting others know.

Thanks for your interest again! And we are happy to hear your feedback on your experiences and the functionalities you would like us to add in the future.


Thank you, great explanation.

What happens when someone wants to update/upgrade the model to a newer version? Can they just get a “diff” and “patch” their model, or do they have to download a whole new one?


Upgrading the model is pretty easy. We just need to build the new model locally in the same way we build the current model. This usually takes fewer than 2min. If people want to deploy the new version to web browser and share for others to use, they just need to upload the model weights to some server (for example we are now using a public Hugging Face repo to store the weights), and provide a link pointing to the weights. This can be achieved also with little effort.


Amazing, thank you!


Does the functionality differ any from Easy Diffusion?

https://github.com/cmdr2/stable-diffusion-ui

It installs in and runs from a single folder, so it's nice and tidy.


Usually Stable Diffusion (and most machine learning models) run on a server with the web front end (eg A1111's SD web ui) just providing a user interface. Even if you download it, you need to run the server on your computer. When you make a request to it you are using the CPU/GPU on the server.

The linked version runs a version of the stable diffusion model in the webbrowser, so it uses your own CPU (and in the case GPU) via the APIs provided by the web browser. This specific implementation uses an API called WebGPU which isn't yet widely supported.


Where's the 4GB model loaded from and to where?


Interesting choice!

Before reading what they used I assumed they would run tch-rs (torchlib bindings for rust) on wgpu and ship it via wasm.


Ouch, torchlib doesn't have a webgpu target though


How does this compare to Automatic1111?


(not the author) This is taking the model and running it entirely in the browser via web assembly and using webGPU, meaning the browser is executing compiled code directly, inside its own process. This is different to the 'web ui' implementations like Auto1111 which are just a website front-end for a python script that is running a server and running the model in a background process on your computer. It's a very different type of implementation.


It works (very, very slowly) on my Intel Macbook. Very impressive indeed. WebGPU has a ton of potential.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: