> How do you determine the right fit for -t and -ngl? t: the number of physical ...

xrd · on May 24, 2023

Thank you.

I used 6 and that dropped the token time to 220ms.

For -ngl, I tried using 24, and then 30 and then 40, and never got to an out of memory error, and got exactly the same token timing, stuck at 220ms.

But, this is very helpful, thank you!

rahimnathwani · on May 24, 2023

I'm curious whether there's any difference if you try with a longer prompt or ask for a longer completion: https://news.ycombinator.com/item?id=35940365

Also curious to know whether the wall clock time (just prepend your command with 'time ') is any different.