Both. Companies are certainly building bigger and bigger clusters for training.
At the same time though, consumer GPUs have gotten significantly faster (compare e.g. an Nvidia 2080TI to a 980TI), and learning algorithms keep improving / better learning algorithms become more widely used (e.g. Adam instead of stochastic gradient descent).
And also, architectural search allowed for neural networks to use more efficient builtin blocks, using many less parameters, and achieving the same accuracy with smaller models (and lowering training cost)
At the same time though, consumer GPUs have gotten significantly faster (compare e.g. an Nvidia 2080TI to a 980TI), and learning algorithms keep improving / better learning algorithms become more widely used (e.g. Adam instead of stochastic gradient descent).