*Key to Dreamer’s success, says Hafner, is that it builds a model of its surroun...

danijar · 2025-04-07T09:14:22 1744017262

Yes, you can decode the imagined scenarios into videos and look at them. It's quite helpful during development to see what the model gets right or wrong. See Fig. 3 in the paper: https://www.nature.com/articles/s41586-025-08744-2

Animats · 2025-04-07T14:52:20 1744037540

So, prediction of future images from a series of images. That makes a lot of sense.

Here's the "full sized" image set.[1] The world model is low-rez images. That makes sense. Ask for too much detail and detail will be invented, which is not helpful.

[1] https://media.springernature.com/full/springer-static/image/...

lnsru · 2025-04-07T07:32:42 1744011162

I implemented an acoustic segmentation system in FPGA recently. The whole world model was a long list of known events and states with feasible transitions. Plus novel things not observed before. Basically rather dumb state machine with machine learning part attached to acoustic sensors. Of course, both parts could be hidden behind weights. But state machine was easily readable and that was the biggest advantage of it.

mnky9800n · 2025-04-07T08:12:14 1744013534

Why would an accounting system need acoustic sensors?

lnsru · 2025-04-07T08:39:58 1744015198

Sorry. Terrible typo. Acoustic system was cheap though.

mnky9800n · 2025-04-07T18:19:37 1744049977

Oh haha. I work on an acoustic detection project so I was quite excited about new applications.

How exactly does your machine learning model work?

lnsru · 2025-04-09T08:46:47 1744188407

I would say, there was not much new in this. The key part of the project was the real time approach. Acquire samples, process them, find peaks, do FFTs, sum, multiply, divide. Get a float number, turn on the proper LEDs. The data was moved in C code between DMA blocks written in VHDL. Actually far away from optimized version. But it worked. IP does not belong to me and I would like to avoid technical details. The project was ended immediately when the company we worked for offered 25000€ for all IP created during the project. Very bad joke. I am still confused, because there was massive potential in this cooperation for everybody involved.

jtsaw · 2025-04-07T09:11:56 1744017116

I’d say it’s more like Waymo’s world model. The main actor uses a latent vector representation of the state of the game to make decisions. This latent vector at train time is meant to compress a bunch of useful information about the game. So while you can’t really understand the actual latent vector that represents state, you do know it encodes at least the state of the game.

This world model stuff is only possible in environments that are sandboxed. Ie you can represent the state of the world in an and have a way of producing the next state given a current state and action. Things like Atari games, robot simulations, etc

TeMPOraL · 2025-04-07T08:25:36 1744014336

> Can you look at the world model, like you can look at Waymo's world model? Or is it hidden inside weights?

I imagine it's the latter, and in general, we're already dealing with plenty of models with world models hidden inside their weights. That's why I'm happy to see the direction Anthropic has been taking with their interpretability research over the years.

Their papers, as well as most discussions around them, focus on issues of alignment/control, safety, and generally killing the "stochastic parrot" meme and keeping it dead - but I think it'll be even more interesting to see attempts at mapping how those large models structure their world models. I believe there's scientific and philosophical discoveries to be made in answering why these structures look the way they do.

namaria · 2025-04-07T09:07:36 1744016856

> killing the "stochastic parrot" meme

This was clearly the goal of the "Biology of LLMs" (and ancillary) paper but I am not convinced.

They used a 'replacement model' that by their own admission could match the output of the LLM ~50% of the time, and the attribution of cognition related labels to the model hinges entirely on the interpretation of the 'activations' seen in the replacement model.

So they created a much simpler model, that sorta kinda can do what the LLM can do in some instances, contrived some examples, observed the replacement model and labeled what it was doing very liberally.

Machine learning and the mathematics involved is quite interesting but I don't see the need to attribute neuroscience/psychology related terms to them. They are fascinating in their own terms and modelling language can clearly be quite powerful.

But thinking that they can follow instructions and reason is the source of much misdirection. The limits of this approach should make clear that feeding text to a text continuation program should not lead to parsing the generated text for commands and running these commands, because the tokens the model outputs are just statistically linked to the tokens inputted to them. And as the model takes more tokens from the wild, it can easily lead to situations that are very clearly an enormous risk. Pushing the idea that they are reasoning about the input is driving all sorts of applications that seeing them as statistical text continuation programs would make clear are a glaring risk.

Machine learning and LLMs are interesting technology that should be investigated and developed. Reasoning by induction that they are capable of more than modelling language is bad science and drives bad engineering.