From the referenced thread[0]: > GPT-3.5 gave me a right-ish answer of 24.848 li...

geysersam · on June 7, 2023

I don't believe that for a second. If that's the answer it gave it's cherry picked and lucky. There are many examples where GPT4 fails spectacularly at much simpler reasoning tasks.

I still think ChatGPT is amazing, but we shouldn't pretend it's something it isn't. I wouldn't trust GPT4 to tell me how much fuel I should put in my car. Would you?

mustacheemperor · on June 7, 2023

>I don't believe that for a second.

This seems needlessly flippant and dismissive, especially when you could just crack open ChatGPT to verify, assuming you have plus or api access. I just did, and ChatGPT gave me a well-reasoned explanation that factored in the extra details about racing the other commenters noted.

>There are many examples where GPT4 fails spectacularly at much simpler reasoning tasks.

I pose it would be more productive conversation if you would share some of those examples, so we can all compare them to the rather impressive example the top comment shared.

>I wouldn't trust GPT4 to tell me how much fuel I should put in my car. Would you?

Not if I was trying to win a race, but I can see how this particular example is a useful way to gauge how an LLM handles a task that looks at first like a simple math problem but requires some deeper insight to answer correctly.

majormajor · on June 7, 2023

> Not if I was trying to win a race, but I can see how this particular example is a useful way to gauge how an LLM handles a task that looks at first like a simple math problem but requires some deeper insight to answer correctly.

It's not just testing reasoning, though, it's also testing fairly niche knowledge. I think a better test of pure reasoning would include all the rules and tips like "it's good to have some buffer" in the prompt.

Kiro · on June 7, 2023

At least debunk the example before you start talking about the shortcomings. Right now your comment feels really misplaced when it's a reply to an example where it actually shows a great deal of complex reasoning.