Hacker News new | past | comments | ask | show | jobs | submit | aislamov's comments login

Model allows image generation in 4-8 steps. 8 steps on M1 Max take 8 seconds, on RTX 3090Ti just 3 seconds. Requires latest chrome canary. Model paper here https://latent-consistency-models.github.io


What error message do you get?


Hold on, to run your demo does one have to click the "Load Model" button before doing anything? 'cos what I see is a form that is greyed out with the error message still at the top:

> You need latest Chrome with "Experimental WebAssembly" and "Experimental WebAssembly JavaScript Promise Integration (JSPI)" flags enabled!

Now I'm wondering whether the top message goes away once the flags are enabled?


> Hold on, to run your demo does one have to click the "Load Model" button before doing anything?

Yes. I thought it won't be good if it would download 3.5gb once you open the page.

>Now I'm wondering whether the top message goes away once the flags are enabled?

No, I haven't added any checks for that (and I'm not sure how the first one can be properly checked), so it's just an info bar. Which is, eventually, misleading.


UNET takes about a 1:10 on WebGPU and around a minute on CPU in one thread. VAE is 2 minutes on CPU and about 10 seconds on GPU. It should be because most GPU ops for VAE are already implemented but for UNET are not. So in the latter case browser is just tossing data from GPU to CPU and back on each step


That's really impressive and much more performant. I was following a different approach: to run any ONNX model without prior modifications.


ONNX is bloated! I got some LLMs working on my own Rust + WebGPU framework a few months ago: https://summize.fleetwood.dev/

I've since moved away from ONNX and to a more GGML style.


Do you have any good resources or links on using ggml with wasm?


I think the Whisper example is your best bet! https://github.com/ggerganov/whisper.cpp/tree/master/example...


Hey! This is what I've been working on, would love to chat, feel free to email


Sure! My email is in my profile.


what's the difference between onnx and ggml style?


ONNX consumes a .onnx file, which is a definition of the network and weights. GGML instead just consumes the weights, and defines the network in code.

Being bound to ONNX means moving at a slower velocity - the field moves so fast that you need complete control.


I haven't used ONNX or GGML, but presumably using GGML means you need to reimplement the network architecture?


You do! But it offers quite a fluid API making it pretty simple. You can see my attempt at a torchesque API here: https://twitter.com/fleetwood___/status/1679889450623459328


No additional libraries required on Windows (even CUDA). For linux you'll need to install CUDA and onnxruntime gpu lib.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: