What're you using for this? llama.cpp? Have a 12GB card (rtx 4070) i'd like to try it on.
https://ollama.com/
I believe its just a HTTP wrapper and terminal wrapper around llama.cpp with some modifications/fork.
https://www.reddit.com/r/ollama/comments/1df757o/high_cost_o...
https://github.com/ollama/ollama/issues/8291
Yes.
Good luck to you mate with your life :)
What're you using for this? llama.cpp? Have a 12GB card (rtx 4070) i'd like to try it on.