The raw computation is just a bunch of matrix multiplications in a row, most of the algorithmic complexity/secret stuff would be around scaling & efficiency.
For training the model the HW is much more important as you need to scale up to as many chips as possible without being bottlenecked by the network.
This would just be inference, and it doesn't need to be very efficient as its for on prem usage not selling API access. So you could strip out any efficiency secrets, and it would probably look like a bigger Gemma (their open source model).
I wonder if they would/could try and strip out stuff like whatever tricks they use for long context + video support (both of which they are a bit ahead of everyone else on).
The model itself is likely built upon their own open source system JAX so they should be usable in Nvidia. Of course cost efficiency is going to be a different story.