But the correct answer isn't inside the model at all, in none of their examples....

YeGoblynQueenne · on Jan 8, 2022

Well, they do say very clearly that they "solve" problems by program synthesis and what they describe is perfectly legit program synthesis.

To clarify, program synthesis (or automatic programming) is the task of generating programs from specifications. There are two kinds of program synthesis: deductive program synthesis, from a complete specification of the target program; and inductive program synthesis, or program induction, from an incomplete specification (such as sets of program inputs and outputs, or traces). An example of deductive program synthesis is the generation of low-level code from a high-level language by a compiler.

What the paper describes is a kind of deductive program synthesis from a complete specification in natural lanaguage. I suspect the true contribution of the work is the demonstration of using natural language as a complete specification, where earlier work generally only demonstrated the use of natural language as incomplete specification (for example, comments describing intent rather than implementation) and the combination of natural language with code; as in the original Codex work [Edit: actually, now that I look again, the codex paper also has examples of comments that fully specify the target program, e.g. in Figure 2: https://arxiv.org/abs/2107.03374; so the work above is typically incremental].

On the other hand it's clear to me that the training has made the model memorise answers and all the work in prompt engineering, described under "Workflow" serves to find the right prompts to retrieve the desired memorisations, much like one must fire just the right SQL query to get back the right data. Certainly interesting to see in action and useful for everyday work, but far from "solving" anything in the gradniose way that it is announced by the authors (e.g. "These astounding results..." in section "Conclusion", etc).

amelius · on Jan 8, 2022

How well would Copilot™ do on this type of problem?

ShamelessC · on Jan 8, 2022

I believe copilot uses the same underlying research as in the paper - codex.