I'd love some more innovation on increasing context size without blowing up RAM usage. Mistral small 2503 24B and Gemma 3 27B both fit into 24GB at Q4, but Mistral can only go up to about 32k and Gemma about 12k before all VRAM is exhausted, even with flash attention and KV cache quantization.