aislamov's comments

aislamov · on Nov 7, 2023

Model allows image generation in 4-8 steps. 8 steps on M1 Max take 8 seconds, on RTX 3090Ti just 3 seconds. Requires latest chrome canary. Model paper here https://latent-consistency-models.github.io

aislamov · on July 18, 2023

What error message do you get?

veb · on July 18, 2023

Hold on, to run your demo does one have to click the "Load Model" button before doing anything? 'cos what I see is a form that is greyed out with the error message still at the top:

> You need latest Chrome with "Experimental WebAssembly" and "Experimental WebAssembly JavaScript Promise Integration (JSPI)" flags enabled!

Now I'm wondering whether the top message goes away once the flags are enabled?

aislamov · on July 18, 2023

> Hold on, to run your demo does one have to click the "Load Model" button before doing anything?

Yes. I thought it won't be good if it would download 3.5gb once you open the page.

>Now I'm wondering whether the top message goes away once the flags are enabled?

No, I haven't added any checks for that (and I'm not sure how the first one can be properly checked), so it's just an info bar. Which is, eventually, misleading.

aislamov · on July 18, 2023

UNET takes about a 1:10 on WebGPU and around a minute on CPU in one thread. VAE is 2 minutes on CPU and about 10 seconds on GPU. It should be because most GPU ops for VAE are already implemented but for UNET are not. So in the latter case browser is just tossing data from GPU to CPU and back on each step

aislamov · on July 18, 2023

That's really impressive and much more performant. I was following a different approach: to run any ONNX model without prior modifications.

FL33TW00D · on July 18, 2023

ONNX is bloated! I got some LLMs working on my own Rust + WebGPU framework a few months ago: https://summize.fleetwood.dev/

I've since moved away from ONNX and to a more GGML style.

naillo · on July 18, 2023

Do you have any good resources or links on using ggml with wasm?

FL33TW00D · on July 18, 2023

I think the Whisper example is your best bet! https://github.com/ggerganov/whisper.cpp/tree/master/example...

bkitano19 · on July 18, 2023

Hey! This is what I've been working on, would love to chat, feel free to email

FL33TW00D · on July 18, 2023

Sure! My email is in my profile.

taminka · on July 18, 2023

what's the difference between onnx and ggml style?

FL33TW00D · on July 18, 2023

ONNX consumes a .onnx file, which is a definition of the network and weights. GGML instead just consumes the weights, and defines the network in code.

Being bound to ONNX means moving at a slower velocity - the field moves so fast that you need complete control.

michaelmior · on July 18, 2023

I haven't used ONNX or GGML, but presumably using GGML means you need to reimplement the network architecture?

FL33TW00D · on July 18, 2023

You do! But it offers quite a fluid API making it pretty simple. You can see my attempt at a torchesque API here: https://twitter.com/fleetwood___/status/1679889450623459328

aislamov · on May 1, 2023

No additional libraries required on Windows (even CUDA). For linux you'll need to install CUDA and onnxruntime gpu lib.