t: the number of physical cores on your CPU
ngl: just try increasing by one or two, until you start seeing inference crash due to 'out of memory' errors
> i5-10400
You have 6 cores, so try `-t 6`
I used 6 and that dropped the token time to 220ms.
For -ngl, I tried using 24, and then 30 and then 40, and never got to an out of memory error, and got exactly the same token timing, stuck at 220ms.
But, this is very helpful, thank you!
Also curious to know whether the wall clock time (just prepend your command with 'time ') is any different.
t: the number of physical cores on your CPU
ngl: just try increasing by one or two, until you start seeing inference crash due to 'out of memory' errors
> i5-10400
You have 6 cores, so try `-t 6`