For a simple question, with RX 6800, I am observing ~50 tok/s on 8B models Deepseek 16B gives ~40 tok/s. 32B doesn't fit in memory
For a simple question, with RX 6800, I am observing ~50 tok/s on 8B models Deepseek 16B gives ~40 tok/s. 32B doesn't fit in memory