The MLC team got that working back in March: https://github.com/mlc-ai/web-stabl...

aislamov · on July 18, 2023

That's really impressive and much more performant. I was following a different approach: to run any ONNX model without prior modifications.

FL33TW00D · on July 18, 2023

ONNX is bloated! I got some LLMs working on my own Rust + WebGPU framework a few months ago: https://summize.fleetwood.dev/

I've since moved away from ONNX and to a more GGML style.

naillo · on July 18, 2023

Do you have any good resources or links on using ggml with wasm?

FL33TW00D · on July 18, 2023

I think the Whisper example is your best bet! https://github.com/ggerganov/whisper.cpp/tree/master/example...

bkitano19 · on July 18, 2023

Hey! This is what I've been working on, would love to chat, feel free to email

FL33TW00D · on July 18, 2023

Sure! My email is in my profile.

taminka · on July 18, 2023

what's the difference between onnx and ggml style?

FL33TW00D · on July 18, 2023

ONNX consumes a .onnx file, which is a definition of the network and weights. GGML instead just consumes the weights, and defines the network in code.

Being bound to ONNX means moving at a slower velocity - the field moves so fast that you need complete control.

michaelmior · on July 18, 2023

I haven't used ONNX or GGML, but presumably using GGML means you need to reimplement the network architecture?

FL33TW00D · on July 18, 2023

You do! But it offers quite a fluid API making it pretty simple. You can see my attempt at a torchesque API here: https://twitter.com/fleetwood___/status/1679889450623459328