You will simply need a lot of GPU cores/VRAM. On my $4,000 Mac Studio M2 Ultra with 64GB of RAM, I can comfortably run deepseek-r1:32b, but a) load times can be annoying (i.e. if you are switching models for different tasks, or let them idle out) and b) you can certainly tell that it requires tuning of the context length, temperature, etc. based on what you need to do.
Compare that with the commercial models where a lot of that is done on a large scale for you.
Yeah that makes sense. Once the model is loaded though, does it work well in comparison to the commercial models? Do you find that the local models hallucinate more, or don't give the same response quality?
Compare that with the commercial models where a lot of that is done on a large scale for you.