Hacker News new | past | comments | ask | show | jobs | submit login

Thank you! What is batch normalization doing and how does it help



There are other mechanisms for dealing with vanishing and exploding gradients. I (maybe wrongly?) think of batch normalization as being most distinctively about fighting internal covariate shift: https://machinelearning.wtf/terms/internal-covariate-shift/


Karpathy covers this in Makemore, but the tl;dr is that if you don’t normalize the batch (essentially center and scale your activations down to be normally distributed), then at gradient/backprop time, you may get values that are significantly smaller or greater than 1. This is a problem, because as you stack layers in sequence (passing outputs to inputs), the gradient compounds (because of the Chain Rule), and so what may have been a well behaved gradient at the end layers has either vanished (the upstream gradients were 0<x<1 at each layer) or exploded (the gradients were x>>1 upstream). Batch normalization helps control the vanishing/exploding gradient problem in deep neural nets by normalizing the values passed between layers.


got it,thanks


It's another one of those mathematical hacks that NNs love so much, which stops the numbers spiralling out of control in big networks.


folks thanks for the explanation




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: