Hacker News new | past | comments | ask | show | jobs | submit login

The MLC team got that working back in March: https://github.com/mlc-ai/web-stable-diffusion

Even more impressively, they followed up with support for several Large Language Models: https://webllm.mlc.ai/




That's really impressive and much more performant. I was following a different approach: to run any ONNX model without prior modifications.


ONNX is bloated! I got some LLMs working on my own Rust + WebGPU framework a few months ago: https://summize.fleetwood.dev/

I've since moved away from ONNX and to a more GGML style.


Do you have any good resources or links on using ggml with wasm?


I think the Whisper example is your best bet! https://github.com/ggerganov/whisper.cpp/tree/master/example...


Hey! This is what I've been working on, would love to chat, feel free to email


Sure! My email is in my profile.


what's the difference between onnx and ggml style?


ONNX consumes a .onnx file, which is a definition of the network and weights. GGML instead just consumes the weights, and defines the network in code.

Being bound to ONNX means moving at a slower velocity - the field moves so fast that you need complete control.


I haven't used ONNX or GGML, but presumably using GGML means you need to reimplement the network architecture?


You do! But it offers quite a fluid API making it pretty simple. You can see my attempt at a torchesque API here: https://twitter.com/fleetwood___/status/1679889450623459328




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: