For ultra large MoEs from deepseek and llama 4, fine-tuning on these models is b...

aurareturn · 2025-04-29T03:25:07 1745897107

  Small and dense models are what local people really need.

Disagreed. Small and dense is dumber and slower for local inferencing. MoEs is what people actually want on local.

RandyOrion · 2025-04-29T04:11:17 1745899877

YMMV.

Parameter efficiency is an important consideration, if not the most important one, for local LLMs because of the hardware constraint.

Do you guys really have GPUs with 80GB VRAM or M3 ultra with 512GB rams at home? If I can't run these ultra large MoEs locally, then these models mean nothing to me. I'm not a large LLM inference provider after all.

What's more, you also lose the opportunities to fine-tune these MoEs when it's already hard to do inference with these MoEs.

aurareturn · 2025-04-29T05:30:38 1745904638

What people actually want is something like GPT4o/o1 running locally. That's the dream for local LLM people.

Running a 7b model for fun is not what people actually want. 7b models are very niche oriented.

RandyOrion · 2025-04-30T01:16:15 1745975775

About <10B LLMs, yes it's not that good. However, <10B is a range that allows many people to do their own tweaking and fine-tuning.

RandyOrion · 2025-04-29T16:39:08 1745944748

For a local LLM, you can't really ask for a certain performance level, it is what it is.

Instead, you can ask for the architecture, be it dense or MoE.

Besides, let's assume the best open weight LLM for now is deepseek r1, is it practical for you to run r1 locally? If not, r1 means nothing to you.

Maybe r1 will be surpassed by llama 4 behemoth. Is it practical for you to run behemoth locally? If not, behemoth also means nothing to you.