I had bet that matmult would be in transformer-optimized hardware costing a frac...

almostgotcaught · 2025-07-19T01:12:23 1752887543

> matmult would be in transformer-optimized hardware

It is... it's in GPUs lol

> first class in torch

It is

> costing a fraction of GPUs

Why would anyone give you this for cheaper than GPUs lol?

atty · 2025-07-19T02:18:18 1752891498

I think they’re referring to hardware like TPUs and other ASICs. Which also exist, of course :)

almostgotcaught · 2025-07-19T02:24:18 1752891858

Sure but GPUs literally have MMA engines now

gchadwick · 2025-07-19T07:29:31 1752910171

The real bottleneck is the memory, optimize your matmul architecture all you like whilst you still have it connected to a big chunk of HBM memory (or whatever your chosen high bandwidth memory is) you can only do so much.

So really GPU v not GPU (e.g. TPU) doesn't matter a whole lot if you've got fundamentally the same memory architecture.