Hacker News new | past | comments | ask | show | jobs | submit login

Interesting. Have you tried playing a full game like this, instead of a single move?

In any case, I don't think this is what people expect out of ChatGPT. Your approach is too "programmer centric". I think people expect telling ChatGPT the rules of the game, in almost plain language, and then expect to be able to play a game of Tic Tac Toe interacting with it like one would with a person. This means, not asking it to write functions or remind it of the state of the board at every step.

This doesn't work consistently for a well-known game like Tic Tac Toe, much less for an arbitrary game you make up.




> Interesting. Have you tried playing a full game like this, instead of a single move?

No, but it is correctly running the best move functions so through induction we can see it will successfully play a full game.

> I think people expect telling ChatGPT the rules of the game, in almost plain language, and then expect to be able to play a game of Tic Tac Toe interacting with it like one would with a person.

This is an unreasonable expectation for a large language model.

When a person computes the sum of two large numbers they do not use their language facilities. They probably require a pencil and pad so they can externalize the computational process. At the very least they are performing calculations in their head in a manner very different from the cognitive abilities used when they catch a ball.

Try playing a game like Risk without a board or pieces, that is, without a concrete mechanism to maintain state.

This approach isn’t cheating and an LLM acting as a translator is a key component. This doesn’t “prove that LLMs are useless bullshit generators, snicker snicker” because it can’t maintain state or do math very well, it just means you need to use other existing tools to do math and maintain state… like JS interpreters.

One thing that I think will improve is that a larger scale language model would need less internally specific terms for the solution in order to reliably get the same results.

Also, translations are necessarily lossy and somewhat arbitrary, so these results need to be considered probabilistically as well. Meaning, generate 10 different thunks and have them act as voting on an answers they compute.


> No, but it is correctly running the best move functions so through induction we can see it will successfully play a full game.

I'm not convinced induction applies. ChatGPT tends to "go astray" in conversations where it needs to maintain state; even with your patch for this (essentially reminding it what the state is at every prompt) I would test it just to make sure it can run a game through completion, make good moves all the way, and be able to tell when the game is over.

I can make ChatGPT do single "reasonable" moves, the problem surfaces during a full game.

> This is an unreasonable expectation for a large language model.

Yes, but enough people hold it anyway that it is a concern. And it's made worse because in some contexts ChatGPT fakes this quite effectively!


> I'm not convinced induction applies. ChatGPT tends to "go astray" in conversations where it needs to maintain state; even with your patch for this (essentially reminding it what the state is at every prompt) I would test it just to make sure it can run a game through completion, make good moves all the way, and be able to tell when the game is over.

You don't seem to understand what I am saying. ChatGPT cannot maintain state in a way that would be useful for playing a game. You must use a computer to interface with ChatGPT, like, via an API. And whatever program is calling ChatGPT needs to maintain the state of the game and can be used to iteratively call GPT.

So by induction once we know that the bestMove function is correct, which we have seen, we know that it will work at the start of any game and work until the game is finished.

I am definitely not talking about firing up the ChatGPT web user interface and trying to get it to magically maintain state.

> Yes, but enough people hold it anyway that it is a concern.

Some people hold this expectation because of a consistent barrage of straw man arguments, marketing hype, and fanboy gushing.

> And it's made worse because in some contexts ChatGPT fakes this quite effectively!

It turns out that a surprising number of computational tasks can be achieved by language models but that is not because they are doing actual computations. They are not at all reliably computers. I don't know where this misnomer came from and from what I can tell this has been known for years. No one has ever hid this fact and there have been solutions involving resorting to computations that have been part of published research for many moons now.

The problem is that most people just want to read clickbait and emote to score fake internet points and they don't want to put in the effort to actually learn about new things.


We seem to be talking at cross purposes. I understand (at a very high level) what LLMs do, and I don't think they can do actual computation.

Why do you insist on things I've already said I understand? I know ChatGPT is not good at maintaining state -- though it can fake it convincingly (which understandably, seems to trip people up). I think it looks at your chat history within the session in order to generate the next response, which is why it can "degenerate" within a single session (but also, it's how it can fake and make it seem it's keeping state, by looking at the whole history before each reply).

I don't understand the rest of your answer. You seem to be really upset at "the people".

PS:

> So by induction once we know that the bestMove function is correct

"By induction", nope. Prove it. Run an actual full game instead of arguing with me. It will take you shorter to play the game than to debate with me.


What's the difference between keeping state and looking at the chat history?

Keeping state is something a human would have to do, because for a human, it would be very tedious and slow to re-read the history to recover context, relative to the timeliness expectation of the interlocutor.


> What's the difference between keeping state and looking at the chat history?

That's an excellent question. I don't know. Intuitively, looking at the chat history would seem a way to keep history, right?

However, in my tests trying to play Tic Tac Toe (informally, not using javascript functions as the comment I was replying to) ChatGPT constantly failed. It claims to know the rules of Tic Tac Toe, yet it repeatedly forgot past board positions, making me think it's not capable of using the chat history to build a model of the game.


Like, we could both be thinking and talking about things like, “I wonder which programming languages are better or worse for these tasks? Is it harder to translate to NASM or ARM64? Or C? Or Lisp? Which lisp performs better? What’s the relationship between training data and programming languages and is this separate from an inherent complexity of a programming language? Can we use LLMs to make objective measurements about programming language complexity?

I have done a little bit of testing and LLMs are objectively worse at writing ASM than JavaScript, which makes sense, because ASM is closer to the metal and properly transcribing into functional ASM requires knowledge of the complexities of a specific CPU, specific calling conventions for an OS, while in contrast JavaScript is closer to natural language so there’s less “work” for the translation task.

But no, instead you want to prove to me that ChatGPT is some parlor trick…


> But no, instead you want to prove to me that ChatGPT is some parlor trick…

Excuse me, what?

I'm sorry, I've zero interest in discussing NASM or Lisp or whatnot. This was about the limitations of ChatGPT, not whatever strikes your fancy.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: