What's the underlying hardware for this?

tome · on Feb 19, 2024

It's a system built from hundreds of GroqChips (a custom ASIC we designed). We call it the LPU (language processing unit). Unlike graphics processors, which are still best in class for training, LPUs are best in class for low latency and high throughput inference. Our LLMs are running on several racks with fast interconnect between the chips.

michaelt · on Feb 19, 2024

They have a paper [1] about their 'tensor streaming multiprocessor'

[1] https://wow.groq.com/wp-content/uploads/2024/02/GroqISCAPape...