As noted in another comment, that is legit program synthesis from a complete spe...

Isinlor · on Jan 9, 2022

> and then fine-tuned again on the kinds of problems that it is supposed to solve

Could you point me to where they claim to have fine-tuned Codex?

From what I can see they claim:

> We use OpenAI’s davinci-codex engine for all of our generations. We fix all of Codex’s hyperparameters to be the same for all experiments: top-p which is the portion p of the token probability mass a language model samples from at each step is set to 1, sampling temperature is set to 0 (i.e. argmax), and response length is set to 200 tokens.

BTW - OpenAI Davinci costs $0.06 per 1000 tokens. Codex is currently free in closed beta, but I guess cost will be the same. I would be happy to pay even 100x ($6) for correct solutions to advanced mathematical problems. The issue is that this paper has ridiculous evaluation and Codex does not work anywhere near as good as they claim. It is pure hype by people who do not appear to be affiliated with OpenAI.

YeGoblynQueenne · on Jan 9, 2022

>> Could you point me to where they claim to have fine-tuned Codex?

I can't find that in the paper so I must have imagined it.

>> Codex is currently free in closed beta, but I guess cost will be the same.

In my previous comment, I'm referring to the "amount of resources" needed to train GPT-3 and then fine-tune Codex, of course.

tsimionescu · on Jan 9, 2022

This is legit program synthesis for an iterative brute force solution to the problem. But it is in no sense "solving university level mathematical problems". It's not even figuring out that it can brute force it based on the original problem - a human looked at the original problem and told the model how to brute force it, and it did. It's cool that it did, but this achievement has nothing to do with the title and abstract of the paper.

YeGoblynQueenne · on Jan 9, 2022

I agree. The paper exaggerates. I was commenting on the workflow you described.