We don't think so, but you be the judge! I believe we quantize both Mixtral and ...

a_wild_dandan · on Feb 19, 2024

Is your confidence rooted in quantified testing, or just vibes? I'm sure you're right, just curious. (My reasoning: running inference at full fp16 is borderline wasteful. You can use q7 with almost no loss.)

monkmartinez · on Feb 20, 2024

I know some fancy benchmark says "almost no loss", but... subjectively, there is a clear quality loss. You can try for yourself, I can run Mixtral at 5.8bpw and there is an OBVIOUS difference between what I have seen from Groq and my local setup beside the sound barrier shattering speed of Groq. I didn't know Mixtral could output such nice code and I have used it A LOT locally.

doctorpangloss · on Feb 20, 2024

Yes, but this gray area underperformance that lets them claim they are the cheapest and fastest appeals to people for whom qualitative (aka real) performance doesn’t matter.

tome · on Feb 19, 2024

What quantified testing would you like to see? We've had a lot of very good feedback from our users, particularly about Mixtral.