But the correct answer isn't inside the model at all, in none of their examples. The correct answer is inside SymPy or NumPy, at least 99% of the time. That is, the model doesn't respond with a demonstration or with the answer itself: it responds with a Python program that poses the given question to SymPy or NumPy, and then they run that program and report the answer.
Here is a basic example:
MIT Course question: Solve each equation for x. ln(x2 − 1) = 3
Model input: Using Sympy, solve Eq ln(x*2-1)=3 for x.
Model output:
from sympy import *
x = symbols(’x’)
solve(log(x**2-1) - 3, x)
As you can see, the model has simply translated a mechanized form of the original question to equivalent Python code. The model has 0 idea of how to solve an equation: it's using a symbolic equation solver.
In other cases, they even "tidy" the original question to a representation of the solution. Here is their example E in Figure 2:
Original question:
> Outside of their humdrum duties as 6.042 TAs, Sayan is trying to learn to levitate using only intense concentration and Jelani is launching a “Nelson 2008” presidential campaign. Suppose that Sayan’s probability of levitating is 1/6, Jelani’s chance of becoming president is 1/4, and the success of one does
not alter the other’s chances. If at most one of them succeeds, what is the probability that Jelani becomes the president of the United States?
"Tidyed question":
> Suppose that Sayan’s probability Sayan succeeding is 1/6, and Jelani’s probability of succeeding is 1/4.
> Use numpy to find the probability at most one of Jelani and Sayan succeed.
> Use numpy to find the probability Jelani succeeds but Sayan does not succeed.
> Divide the former by the latter probability.
Model solution:
import numpy as np
# Probability of Sayan succeeding
p_sayan = 1/6
# Probability of Jelani succeeding
p_jelani = 1/4
# Probability of at most one of Jelani and Sayan succeeding
p_at_most_one = 1 - (p_sayan * p_jelani)
# Probability of Jelani succeeding but Sayan not succeeding
p_jelani_succeed_sayan_not = p_jelani * (1 - p_sayan)
# Probability of Jelani succeeding but Sayan not succeeding divided by
probability of at most one of Jelani and Sayan succeeding
p_jelani_succeed_sayan_not / p_at_most_one
# Probability of at least one of Jelani and Sayan succeeding
p_at_least_one = 1 - p_at_most_one
Tidying up the extra verbiage of the question is absolutely fair. But then, they also explain exactly how to compute the result using the data in the question; the model then generates code that perfectly matches the described algorithm, it's again not using even the tiniest bit of mathematical understanding.
I have browsed their examples, and I have not seen even a single one where the model does more than rephrase the question into a 1:1 Python representation of the question itself.
None of the answers would pass even the simplest undergrad exam. They are literally of the form "how would you solve equation E?" "I would write a program that says sympy.solve(E)".
Well, they do say very clearly that they "solve" problems by program synthesis
and what they describe is perfectly legit program synthesis.
To clarify, program synthesis (or automatic programming) is the task of
generating programs from specifications. There are two kinds of program
synthesis: deductive program synthesis, from a complete specification of the
target program; and inductive program synthesis, or program induction, from an
incomplete specification (such as sets of program inputs and outputs, or
traces). An example of deductive program synthesis is the generation of
low-level code from a high-level language by a compiler.
What the paper describes is a kind of deductive program synthesis from a
complete specification in natural lanaguage. I suspect the true contribution of
the work is the demonstration of using natural language as a complete
specification, where earlier work generally only demonstrated the use of natural
language as incomplete specification (for example, comments describing intent
rather than implementation) and the combination of natural language with code;
as in the original Codex work [Edit: actually, now that I look again, the codex
paper also has examples of comments that fully specify the target program, e.g.
in Figure 2: https://arxiv.org/abs/2107.03374; so the work above is typically
incremental].
On the other hand it's clear to me that the training has made the model memorise
answers and all the work in prompt engineering, described under "Workflow"
serves to find the right prompts to retrieve the desired memorisations, much
like one must fire just the right SQL query to get back the right data.
Certainly interesting to see in action and useful for everyday work, but far
from "solving" anything in the gradniose way that it is announced by the authors
(e.g. "These astounding results..." in section "Conclusion", etc).
Here is a basic example:
MIT Course question: Solve each equation for x. ln(x2 − 1) = 3
Model input: Using Sympy, solve Eq ln(x*2-1)=3 for x.
Model output:
As you can see, the model has simply translated a mechanized form of the original question to equivalent Python code. The model has 0 idea of how to solve an equation: it's using a symbolic equation solver.In other cases, they even "tidy" the original question to a representation of the solution. Here is their example E in Figure 2:
Original question:
> Outside of their humdrum duties as 6.042 TAs, Sayan is trying to learn to levitate using only intense concentration and Jelani is launching a “Nelson 2008” presidential campaign. Suppose that Sayan’s probability of levitating is 1/6, Jelani’s chance of becoming president is 1/4, and the success of one does not alter the other’s chances. If at most one of them succeeds, what is the probability that Jelani becomes the president of the United States?
"Tidyed question":
> Suppose that Sayan’s probability Sayan succeeding is 1/6, and Jelani’s probability of succeeding is 1/4.
> Use numpy to find the probability at most one of Jelani and Sayan succeed.
> Use numpy to find the probability Jelani succeeds but Sayan does not succeed.
> Divide the former by the latter probability.
Model solution:
Tidying up the extra verbiage of the question is absolutely fair. But then, they also explain exactly how to compute the result using the data in the question; the model then generates code that perfectly matches the described algorithm, it's again not using even the tiniest bit of mathematical understanding.I have browsed their examples, and I have not seen even a single one where the model does more than rephrase the question into a 1:1 Python representation of the question itself.
None of the answers would pass even the simplest undergrad exam. They are literally of the form "how would you solve equation E?" "I would write a program that says sympy.solve(E)".