How good is Trillium/TPU compared to Nvidia? It seems the stats are: tpu v6e achieves 900 TFLOPS per chip (fp16) while Nvidia H100 achieves 1800 TFLOPS per gpu? (fp16)?
1800 on the h100s is with 2/4 sparsity, it’s half of that without. Not sure if the tpu number is doing that too, but I don’t think 2/4 is used that heavily so I probably would compare without it.
Would be neat if anyone has benchmarks!!