Unless I misread this paper, their argument is entirely hypothetical. Meaning th... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		_heimdall 10 months ago \| parent \| context \| favorite \| on: Chain-of-thought can hurt performance on tasks whe... Unless I misread this paper, their argument is entirely hypothetical. Meaning that the LLM is still a black box and they can only hypothesise what is going internally by viewing the output(s) and guessing at what it would take to get there. There's nothing wrong with a hypothesis or that process, but it means we still don't know whether models are doing this or not.

thomashop 10 months ago [–]

Maybe I mixed up that paper with another but the one I meant to post shows that you can read something like a world model from the activations of the layers.

There was a paper that shows a model trained on Othello moves creates a model of the board, models the skill level of their opponent and more.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact