Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One GrogCard has 230 MB SRAM, which is enough for every single weight matrix of Mixtral-8x7B. Code to check:

    import urllib.request, json, math

    for i in range(1, 20):
        url = f"https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/resolve/main/model-{i:05d}-of-00019.safetensors?download=true"

        with urllib.request.urlopen(url) as r:
            header_size = int.from_bytes(r.read(8), byteorder="little")
            header = json.loads(r.read(header_size).decode("utf-8"))
            for name, value in header.items():
                if name.endswith(".weight"):
                    shape = value["shape"]
                    mb = math.prod(shape) * 2e-6
                    print(mb, "MB for", shape, name)
tome's other comment mentions that they use 568 GroqChips in total, which should be enough to fit even Llama2-70B completely in SRAM. I did not do any math for the KV cache, but it probably fits in there as well. Their hardware can do matrix-matrix multiplications, so there should not be any issues with BLAS. I don't see why they'd need other hardware.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: