One GrogCard has 230 MB SRAM, which is enough for every single weight matrix of ...

One GrogCard has 230 MB SRAM, which is enough for every single weight matrix of Mixtral-8x7B. Code to check:

    import urllib.request, json, math

    for i in range(1, 20):
        url = f"https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/resolve/main/model-{i:05d}-of-00019.safetensors?download=true"

        with urllib.request.urlopen(url) as r:
            header_size = int.from_bytes(r.read(8), byteorder="little")
            header = json.loads(r.read(header_size).decode("utf-8"))
            for name, value in header.items():
                if name.endswith(".weight"):
                    shape = value["shape"]
                    mb = math.prod(shape) * 2e-6
                    print(mb, "MB for", shape, name)

tome's other comment mentions that they use 568 GroqChips in total, which should be enough to fit even Llama2-70B completely in SRAM. I did not do any math for the KV cache, but it probably fits in there as well. Their hardware can do matrix-matrix multiplications, so there should not be any issues with BLAS. I don't see why they'd need other hardware.