anuarsh's comments

anuarsh · 2025-09-24T16:33:36 1758731616

Good question, need to research this one

anuarsh · 2025-09-23T22:25:59 1758666359

There's one more exciting thing about Qwen3-next (except, efficient MoE architecture and fast linear attention) - MTP (Multi token prediction). It is the additional layer that allows generating more tokens without the need to go through the model again. I'm trying to make it work, but unsuccesfully yet. Maybe someone could help me with it - https://github.com/Mega4alik/ollm/blob/dev/src/ollm/qwen3_ne... (dev brunch). Take a look

anuarsh · 2025-09-23T21:56:04 1758664564

I haven't tested on Apple machines yet, but gpt-oss and qwen3-next should work I assume. Llama3 versions use cuda specific loading logic for speed boost, so it won't work for sure

anuarsh · 2025-09-23T21:52:01 1758664321

Thanks! I don't have much experience with diffusion models, but technically any multi-layer model could benefit from loading weights one by one

anuarsh · 2025-09-23T21:45:58 1758663958

1tok/2s is the best I got on my PC, thanks to MoE architecture of qwen3-next-80B. gpt-oss-20B is slower because I load all single layer experts to GPU and unpack weights (4bit -> bf16) each time. While with qwen3-next I load only active experts (normally 150 out of 512). Probably I could do the same with gpt-oss.

anuarsh · 2025-09-23T21:40:17 1758663617

CPU is much slower than GPU. You can actually use both by offloading some layers to CPU as o.offload_layers_to_cpu(layers_num=12). It is faster to load from RAM than from SSD.

anuarsh · 2025-08-29T18:24:18 1756491858

We are talking about 100k context here. 20k would be much faster, but you won't need KVCache offloading for it

anuarsh · 2025-08-29T01:09:46 1756429786

Absolutely, there are tons of cases where interactive experience is not required, but ability to process large context to get insights.

attogram · 2025-08-29T08:09:58 1756454998

It would be interesting to see some benchmarks of this vs, for example, Ollama running localy with no timeout

anuarsh · 2025-08-28T23:45:04 1756424704

Hi everyone, any comments or questions are appreciated