What does this mean? I can shape a network some other way?

enriquto · on May 7, 2019

Yes, you can think very hard and set up the weights by hand, or by an automatic "compiler" that transforms your desired algorithm into a deep net.

Ragib_Zaman · on May 7, 2019

Good practitioners do both. Whatever knowledge about the weights you can predetermine, you set those as the initial values of your weights. Then you improve on that by training. It really depends on the problem and size of your network how many weights you'll have a good estimate for but yes, you should try to estimate good weights yourself first.

This practice is well known but here's a concrete source from Andrej Karpathy:

https://karpathy.github.io/2019/04/25/recipe/

"init well. Initialize the final layer weights correctly. E.g. if you are regressing some values that have a mean of 50 then initialize the final bias to 50. If you have an imbalanced dataset of a ratio 1:10 of positives:negatives, set the bias on your logits such that your network predicts probability of 0.1 at initialization. Setting these correctly will speed up convergence and eliminate “hockey stick” loss curves where in the first few iteration your network is basically just learning the bias."

cshimmin · on May 7, 2019

Good luck manually writing a program that achieves state of the art performance on any typical machine learning task... we'll see which "training" method takes longer.