Here the neural network was given examples of how to use the calculator for each question which means it wasn't generating it's own abstractions.
If you wanted to use this to solve other (e.g. programming) problems you would need examples of every step required for almost every problem.
Using neural networks in this way is akin to locality sensitive hashing, instead it should understand what it's lowest level operators do and discover useful combinations of them that can solve new problems.
I haven't been following this field, but anyone know what happened to Neural Programmer Interpreters (2015)? It seemed like such a promising direction back then. It showed that a neural network can learn to use arbitrary commands to execute algorithms such as multidigit addition and bubble sort: http://www-personal.umich.edu/~reedscot/iclr_project.html
That seems like a much better demo of using blackbox tools as substeps in problem solving. Is there a reason why it shouldn't work when the blackbox is a more complex function like sympy's eval?
> Something that intrigued me in Saxton et. al.’s paper was how high a baseline transformer scored on probability tasks (~0.77 and ~0.73), given that working these out are a multi-step process. How could basic pattern-matching score so highly on such a task? Is mere perception enough to figure out something like the probability product rule, on such a generic architecture without any prior knowledge of numbers or probability?
> To try and explain this, we point out that although questions are unique, a lot of them will share the same answers. For example, Calculate prob of sequence aad from abcda, Calculate prob of sequence bbz from zbbmn, and Calculate prob of sequence rpr from {r: 2, p: 1, x:2} all lead to the same answer, 1/30.
> Doing a bit of analysis on training set questions, we find that out of 1 million samples each, swr_p_level_set and swr_p_sequence have 977179 and 978045 unique questions, respectively. This seems reasonable, as duplicates are limited to <3% of the training set and the distribution over questions appears fairly uniform.
> On the other hand, doing analysis on training set answers reveals that out of 1 million samples eachs, swr_p_level_set and swr_p_sequence have 1458 and 1865 unique answers, respectively.
> Counting the collective number of samples that share the top K most common answers reveals even more imbalance.
This is the real takeaway for me from the article.
From the title, I was expecting the neural network to take an input (e.g., speech or a string "5+11+3=") and then control mouse movements to push the keys on a calculator program (e.g., Windows Calculator). I.e., a neural network driving an existing user interface based on commands from a user.
But the article is more about using neural network transformers to build steps of a mathematical proof with each step checked by a symbolic "calculator". I.e., transformers applied to mathematical proofs.
Of course you could train a NN to do arithmetic, but this is much more impressive.
Training a NN network to solve problems with available tools means more abstraction, and is closer to AGI than just essentially learning a LUT.
Yeah. I only trained addition. Actually exploring the impact of training a net to perform a range of operations on the minimum plausible neuron count would be quite interesting
I don’t see any reason why it would be significantly harder to do, however
You’re right about accuracy. I didn’t let the model train enough to push the error low enough to guarantee exact results over the input range. But then again this was designed as a toy experiment, not something people should rely on
If you wanted to use this to solve other (e.g. programming) problems you would need examples of every step required for almost every problem.
Using neural networks in this way is akin to locality sensitive hashing, instead it should understand what it's lowest level operators do and discover useful combinations of them that can solve new problems.