When you say "it can't do logic" what do you mean? "Logic" can be as simple as A...

bentcorner · on June 5, 2023

Not OP but here's an example of how GPT-4 can't deal with the goat/wolf/cabbage problem when things are switched up just a little.

https://amistrongeryet.substack.com/p/gpt-4-capabilities

Although it's interesting that if you use different nouns it does just fine: https://jbconsulting.substack.com/p/its-not-just-statistics-...

cubefox · on June 5, 2023

I asked Bing a variant of the Wason selection task (a logic test/riddle). Instead of answering directly, it searched the Web for "Wason selection task solution" (so it knew what the task was called, I didn't give it the name), and then provided its answer based on that search result. Except the task in the search result was different in the specifics (different colors) so it gave the wrong answer. Also insisted that its solution was right. Though maybe that's an issue with Microsoft's fine-tuning rather than with the base model itself.

MereInterest · on June 5, 2023

I hadn't heard of that task, and it was interesting to see ChatGPT attempt the same problem. After a wrong answer, I gave it a leading question and received the following response.

> If you were to turn over the yellow card and find the number 7 on the other side, it would not disprove the statement "If a card has an odd number on one side, then the other side is purple." In fact, this discovery would not provide any evidence either for or against the statement.

> The statement specifically refers to cards with odd numbers on one side and their corresponding color on the other side. It does not make any claims about the colors of cards with even numbers. Therefore, even if the yellow card had an odd number like 7 on the other side, it would not contradict the statement.

It's interesting to see the model explaining exactly what would be necessary to find, exactly what it could find, and then fail to make any connection between the two.

geysersam · on June 5, 2023

Yes it's very fascinating! The language is so clear but the concepts are totally confused.

Does this mean real logical reasoning is very close, only some small improvements away, or does it mean we're just on the wrong track (to reach actual AGI)?

Tainnor · on June 6, 2023

IMHO (and this is just my own uniformed view), this means that language models by themselves are insufficient for certain important tasks. It seems to be hard for systems to learn deductive reasoning purely based on text prediction.

OTOH, who knows what would happen if you somehow managed to combine the generating capabilities of a language model with a proper inference engine, e.g. Wolfram|Alpha. Maybe it would bring us significantly closer to AGI, but maybe that way is also a dead-end because it's not guaranteed that those systems would work well together.

Tainnor · on June 5, 2023

The biggest giveaway is that it doesn't have an internally consistent model of what it's trying to do.

For example, I've once asked it to "prove that first order logic is not complete", a statement which is false, given that every logic textbook will give you a proof for why first order logic is complete. ChatGPT apparently knew this too, because it happily reproduced that result but somehow thought that it was a valid answer for what I asked for (the valid answer would have been "I can't prove this because it's wrong").

I can't reproduce this exact result right now (in fact when I tried, it just proved something slightly different, which was correct but not exactly what I asked for). But I was able to similarly confuse it by asking it to prove that the equation x^3+y^3=z^3 has solutions. That this is true is immediately apparent by x=y=z=0, but ChatGPT replies with:

> The equation x^3 + y^3 = z^3 represents a special case of Fermat's Last Theorem, which states that there are no solutions to the equation x^n + y^n = z^n for any integer values of n greater than 2.

> For the case of n = 3, the equation x^3 + y^3 = z^3 has infinitely many solutions. One such set of solutions is known as Pythagorean triples, where x, y, and z are positive integers.

> A Pythagorean triple is a set of three positive integers (x, y, z) that satisfy the equation x^2 + y^2 = z^2, which is equivalent to (x^2)^(3/2) + (y^2)^(3/2) = (z^2)^(3/2). By raising both sides to the power of 3/2, we get x^3 + y^3 = z^3.

> For example, the Pythagorean triple (3, 4, 5) satisfies 3^3 + 4^3 = 5^3 (27 + 64 = 125).

This answer is just confused on so many levels:

- It quotes back Fermat's Last Theorem at me (as indeed I hoped it would), but that theorem only applies to positive integer solutions and nowhere did I specify that constraint.

- If the Theorem did apply, then it would be a proof that such solutions don't exist. So ChatGPT has no internal understanding of how a theorem it quotes relates to a specific question, it just parrots off things that look vaguely similar to the input.

- Then, it just tells me what Pythagorean Triples are, which is hilarious, because those are the solutions to x^2+y^2=z^2 - and not what I asked. It then tries to somehow transform Pythagorean triples into (non-integer) solutions of my equation (which doesn't work), and then doesn't even apply the transformation to its own example (and the calculation is just... wrong).

The problem IMO is not that ChatGPT gives a wrong answer, it's that its answer isn't even internally consistent.

pixl97 · on June 5, 2023

Are you using code interpreter to get the answers, or is this just based GPT4?

Tainnor · on June 5, 2023

what do you mean? It's ChatGPT. Quite possibly GPT-4 performs a bit better but the underlying principle is the same.

kytazo · on June 5, 2023

Aristotle has defined logic in Organon.

https://en.wikipedia.org/wiki/Organon

kytazo · on June 5, 2023

For the people downvoting, his work was literally where logic originates from. Not only he theorized about it but he also described the exact rules which define logic.

The origin of the very word Logic has its roots in that exact era as phrased at the time, by the very people who came up with its ruleset in the first place.

You may define logic otherwise but in the context of past occurrences they're more or less irrelevant.