Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Qwen2-7B-Instruct with TensorRT-LLM: consistently high tokens/SEC (inferless.com)
1 point by agcat on Sept 5, 2024 | hide | past | favorite | 1 comment


Hey community: In this deep dive, analyzed LLM speed benchmarks, comparing models like Qwen2-7B-Instruct, Gemma-2-9B-it, Llama-3.1-8B-Instruct, Mistral-7B-Instruct-v0.3, Phi-3-medium-128k-instruct across Libraries like vLLM, TGI, TensorRT-LLM, Tritonvllm, Deepspeed-mii, ctranslate. All independent on A100 GPUs on Azure, no sponsorship.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: