Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> How do you determine the right fit for -t and -ngl?

t: the number of physical cores on your CPU

ngl: just try increasing by one or two, until you start seeing inference crash due to 'out of memory' errors

> i5-10400

You have 6 cores, so try `-t 6`



Thank you.

I used 6 and that dropped the token time to 220ms.

For -ngl, I tried using 24, and then 30 and then 40, and never got to an out of memory error, and got exactly the same token timing, stuck at 220ms.

But, this is very helpful, thank you!


I'm curious whether there's any difference if you try with a longer prompt or ask for a longer completion: https://news.ycombinator.com/item?id=35940365

Also curious to know whether the wall clock time (just prepend your command with 'time ') is any different.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: