Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I thought LLaMA outscored GPT-3


GPT-3 is a very different model from GPT-3.5. My understanding is that they were comparing LLaMA's performance to benchmark scores published for the original GPT-3, which came out in 2020 and had not yet had instruction tuning, so was significantly harder to use.


I know, that is why I said GPT-3 (Davinci) not GPT-3.5|ChatGPT.


Da Vinci 002 and 003 are actually classified as GPT 3.5 by OpenAI: https://platform.openai.com/docs/models/gpt-3-5

ChatGPT is GPT-3.5 Turbo.


Would you mind summarizing the difference between GPT 3.5 and GPT-3.5 Turbo? I'm not clear about that.


GPT 3.5 is the instruction tuned modern GPT models, such as Da Vinci 002 and 003.

3.5 Turbo is the ChatGPT model: it's cheaper (1/10th the price), faster and has a bunch of extra RLHF training to make it work well as a safe and usable chatbot.

https://openai.com/blog/introducing-chatgpt-and-whisper-apis introduced the turbo model.


Hard to measure these days. The training sets are so large they might contain leaks of test sets. Take these numbers with a grain of salt.


Or... it could be that Chinchilla study has deficiencies in measuring capabilities of models maybe? Either that or your explanation. Frankly I don't think 13B is better than GPT-3 (text-davinci-001 which I think is not RLHF - but maybe better than base)


text-davinci-001 is currently classed as "GPT 3.5" by OpenAI, and it did indeed have RLHF in the form of instruction tuning: https://openai.com/research/instruction-following

MY MISTAKE: 002 and 003 are 3.5, but 001 looks to have pre-dated the InstructGPT work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: