One GrogCard has 230 MB SRAM, which is enough for every single weight matrix of Mixtral-8x7B. Code to check:
import urllib.request, json, math
for i in range(1, 20):
url = f"https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/resolve/main/model-{i:05d}-of-00019.safetensors?download=true"
with urllib.request.urlopen(url) as r:
header_size = int.from_bytes(r.read(8), byteorder="little")
header = json.loads(r.read(header_size).decode("utf-8"))
for name, value in header.items():
if name.endswith(".weight"):
shape = value["shape"]
mb = math.prod(shape) * 2e-6
print(mb, "MB for", shape, name)
tome's other comment mentions that they use 568 GroqChips in total, which should be enough to fit even Llama2-70B completely in SRAM. I did not do any math for the KV cache, but it probably fits in there as well. Their hardware can do matrix-matrix multiplications, so there should not be any issues with BLAS. I don't see why they'd need other hardware.