your point is valid but the paper explains it clearly and obviously. they are NOT dimensionally reduced hyperparameters, no. The hyperparameters are learning rates, that's it. X axis, learning rate for input (1 hidden layer). Y axis, learning rate for output layer.
So what this is saying, for certain ill-chosen learning weights, model convergence is for lack of a better word, chaotic and unstable.
Just to add to this, only the two learning rates are changed, everything else including initialization and data is fixed. From the paper:
Training consists of 500 (sometimes 1000) iterations of full batch steepest gradient descent. Training is performed for a 2d grid of η0 and η1 hyperparameter values, with all other hyperparameters held fixed (including network initialization and training data).
So what this is saying, for certain ill-chosen learning weights, model convergence is for lack of a better word, chaotic and unstable.