In fairness it’s become even more difficult now than ever before. \* hardware sp...

In fairness it’s become even more difficult now than ever before.

* hardware spec

* inference engine

* specific model - differences to tokenizer will make models faster/slower with equivalent parameter count

* quantization used - and you need to be aware of hardware specific optimizations for particular quants

* kv cache settings

* input context size

* output token count

This is probably not a complete list either.