Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's explained in the post:

> Have you ever done a dense grid search over neural network hyperparameters? Like a really dense grid search? It looks like this (!!). Blueish colors correspond to hyperparameters for which training converges, redish colors to hyperparameters for which training diverges.



I saw this, but am still not clear on what the axes represent. I assume two hyperparameters, or possibly two orthogonal principal components. I guess my point is it’s not clear how/which parameters are mapped onto the image.


your point is valid but the paper explains it clearly and obviously. they are NOT dimensionally reduced hyperparameters, no. The hyperparameters are learning rates, that's it. X axis, learning rate for input (1 hidden layer). Y axis, learning rate for output layer.

So what this is saying, for certain ill-chosen learning weights, model convergence is for lack of a better word, chaotic and unstable.


Just to add to this, only the two learning rates are changed, everything else including initialization and data is fixed. From the paper:

Training consists of 500 (sometimes 1000) iterations of full batch steepest gradient descent. Training is performed for a 2d grid of η0 and η1 hyperparameter values, with all other hyperparameters held fixed (including network initialization and training data).




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: