Parameter efficiency is an important consideration, if not the most important one, for local LLMs because of the hardware constraint.
Do you guys really have GPUs with 80GB VRAM or M3 ultra with 512GB rams at home? If I can't run these ultra large MoEs locally, then these models mean nothing to me. I'm not a large LLM inference provider after all.
What's more, you also lose the opportunities to fine-tune these MoEs when it's already hard to do inference with these MoEs.
Small and dense models are what local people really need.
Although benchmaxxing is not good, I still find this release valuable. Thank you Qwen.