Interesting, I'm getting 100ms/token on plain old wasm with 4 threads via ggml, ...

		lxe on April 21, 2023 \| parent \| context \| favorite \| on: WebGPU GPT Model Demo Interesting, I'm getting 100ms/token on plain old wasm with 4 threads via ggml, using a 1.7B quantized cerebras model.