Yeah. My 3090 gets like ~5 tokens/s on 70B Q3KL. This is a good idea, as splitti...

		brucethemoose2 on Sept 17, 2023 \| parent \| context \| favorite \| on: Run LLMs at home, BitTorrent‑style Yeah. My 3090 gets like ~5 tokens/s on 70B Q3KL. This is a good idea, as splitting up llms is actually pretty efficient with pipelined requests.