Is there a spreadsheet out there benchmarking local LLM and hardware configs? I want to know if I should even bother with my coffeelake xeon server or if it is something to consider for my next gaming rig.
Its really not hard to test with llamafile or ollama, especially with smaller 7B models. Just have a go.
There are a bazzillion and one hardware combinations where even RAM timings can make a difference. Offloading a small portion to a GPU can make a HUGE difference. Some engines have been optimized to run on Pascal with CUDA compute below 7.0, and some have tricks for newer gen cards with modern CUDA. Some engines only run on Linux while others are completely x-platform. It is truly the wild-west of combinatorics as they relate to hardware and software. It is bewildering to say the least.
In other words, there is no clear "best" outside of a DGX and Linux software stack. The only way to know anything right now is to test and optimize for what you want to accomplish by running a local llm.