Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As noted in another comment, that is legit program synthesis from a complete specification in natural language. The workflow described in the paper and abstracted in your comment can be very useful, as long as the user can have good confidence in the correctness of the results (or the ability to eyball and correct them as needed).

The problem is that this approach relies on a large language model trained on a copy of the entire web (GPT-3, trained on Common Crawl plus network extras), fine-tuned on github (Codex) and then fine-tuned again on the kinds of problems that it is supposed to solve. That is an insane amount of resources to spend to solve simple programming problems with known solutions that can be implemented by hand at much lower cost and effort. In that sense it's a little bit disappointing: so much data, so much compute, and all you can do is tell computer how to code python?



> and then fine-tuned again on the kinds of problems that it is supposed to solve

Could you point me to where they claim to have fine-tuned Codex?

From what I can see they claim:

> We use OpenAI’s davinci-codex engine for all of our generations. We fix all of Codex’s hyperparameters to be the same for all experiments: top-p which is the portion p of the token probability mass a language model samples from at each step is set to 1, sampling temperature is set to 0 (i.e. argmax), and response length is set to 200 tokens.

BTW - OpenAI Davinci costs $0.06 per 1000 tokens. Codex is currently free in closed beta, but I guess cost will be the same. I would be happy to pay even 100x ($6) for correct solutions to advanced mathematical problems. The issue is that this paper has ridiculous evaluation and Codex does not work anywhere near as good as they claim. It is pure hype by people who do not appear to be affiliated with OpenAI.


>> Could you point me to where they claim to have fine-tuned Codex?

I can't find that in the paper so I must have imagined it.

>> Codex is currently free in closed beta, but I guess cost will be the same.

In my previous comment, I'm referring to the "amount of resources" needed to train GPT-3 and then fine-tune Codex, of course.


This is legit program synthesis for an iterative brute force solution to the problem. But it is in no sense "solving university level mathematical problems". It's not even figuring out that it can brute force it based on the original problem - a human looked at the original problem and told the model how to brute force it, and it did. It's cool that it did, but this achievement has nothing to do with the title and abstract of the paper.


I agree. The paper exaggerates. I was commenting on the workflow you described.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: