At least for R1, folks more technical than me said the optmizations DeepSeek made doesn't make sense unless they were gimped by limited hardware. But the caveate being the limited hardware weren't actually super limited - NVIDIA exported gimped but still powerful hardware that was considerd legal under export controls - but DeepSeek engineers found optimizations to basically unlock full compute power. Maybe something similar.
A quick research shows a 8 GPU A100 80GB server can easily be 120-150k a pop. So you are looking a few million in hardware costs if you wanted these on prem. The energy cost for the training is insignificant from my calculations.
So yeah, I imagine this is not a big deal for large, well funded, universities.
Biggest issue with these is ROI (obviously not real ROI) as GPUs have been progressing so fast recently for AI usecases that unless you are running them 24/7 what's the point of having them onprem.