load model, compute a 1k token response (ie, do a thousand forward passes in seq...

		QuadmasterXLII on Feb 19, 2024 \| parent \| context \| favorite \| on: Groq runs Mixtral 8x7B-32k with 500 T/s load model, compute a 1k token response (ie, do a thousand forward passes in sequence, one per token), load a different model, compute a response, I would expect the model loading to take basically zero percent of the time in the above workflow