It”s only mentioned at the end. If you use some regularization, you have many ways to fit the data (almost) perfectly, and you choose the one with the smallest coefficients, which is often one that generalizes well. And even without explicit regularization, some training methods implicitly apply sone form of regularization, when the number of parameters exceeds the number of data points.