Today's high-end LLMs can do a lot of unsupervised work. Debug iterations are at least junior level. Audio and visual output verification is still very week (i.e. to verify web page layout and component reactivity). Once the visual model is good enough to look at the screen pixels and understand, it will instantly replace junior devs. Currently if you have only text output all new LLMs can iterate flawlessly and solve problems on it. New backend dev from scratch is completely doable with vibe coding now, with some exceptions around race conditions and legacy code comprehension.
> Once the visual model is good enough to look at the screen pixels and understand, it will instantly replace junior devs
Curious if you gave Antigravity a try yet? It auto-launches a browser and you can watch it move the mouse and click around. It's able to review what it sees and iterate or report success according to your specs. It takes screen recordings and saves them as an artifact for you to verify.
I only tried some simple things with it so far but it worked well.
Good time to remind everybody that the Fed, the US central bank, prices gold at $42.22 for some reason and nobody questions it ever. And all those dignified educated respectable board members and econ PhDs give speeches about science-based market economy, maintaining credibility and staying away from politics while caring about the wellbeing of households and families.
That's wildly misunderstanding things. The fed doesn't trade, regulate, hold or sell gold. That price you're quoting is for a historical artifact called a Gold Certificate. These things have a value set by law (not by the fed) of 42 and 2/9th dollars. They don't sell them anymore, but if you happen to have one they're required to buy it from you (IIRC) at that rate.
It's a little bit broader than this. In the Fed there is the so called "statutory price for gold" and it's not limited to gold certificates. Any gold in the Fed would be priced at $42 by law. The fact that they don't technically own any gold and work around the issue only makes it so much more amusing. It only serves to tell people they can fix prices and make outrageous course-correction changes overnight and people will still argue "it's fine and it's legal" afterwards.
Again, that's all just conspiracy nonsense. Yes, there are old laws. No, they don't effect macroeconomic policy and to claim they do is silly. This is of a piece with the Trillion Dollar Coin nonsense[1] being hawked in equally silly circles on the other side of the aisle.
Real world economic policy works by virtue of steady hands and rigorously applied norms, not goofball trickery around edge cases of ancient laws.
[1] The idea that the Treasury's statutory authority to mint coinage could be exploited to mint a single illiquid-by-virtue-of-size asset that could then be borrowed against without increasing the debt ceiling.
Yes you are making my point exactly, I just gave you a link to the official federal reserve website that values gold at $42 and you still refer to it as a conspiracy theory. Surely they could have fixed this by now if it was a typo or a conspiracy.
The fact they they kept the price fixed should serve as a reminder they can do this at any time. There is no conspiracy theory, they've literally done this and nobody challenged them and no laws have changed since then (thus the "ancient law" that is somehow still in effect). The sophisticated economists that study "real world economic policy works by virtue of steady hands and rigorously applied norms" can make another "ancient law" any time they want.
Citing the completely-unused ancient law in support of a claim that the fed is somehow "fixing prices" or "making outrageous course corrections overnight" or that they can "make another law any time they want" is the conspiracy. They aren't doing that. And they never have. And you know it. Which is why you're making noise about ancient unused laws.
Well, please show me any mainstream finance media that questions it. The Fed has a monthly press conference, I don't think they were asked even once in the last 10 years at least... Most of the buying now is from Asia supposedly.
I am fascinated your comment is buried down here with absolutely no discussion. Money printing is undoubtedly the biggest factor in this particular inflation spike, but something is quietly steering the narrative away from it almost everywhere.
While the question of alternative actions and outcomes is also valid, this is a literal 10-trillion dollar question that nobody in a leadership position wants to ask or answer.
If AI is the cause, it will only stimulate investment in more AI and accelerate the layoffs though. And because they don't have other tools, looks like this is exactly what they are going to do. Investment tends to concentrate massively at the festest-growing trend which this time is just replacing workers with AI.
Which will deepen the hole, until we crest over the AI peak of inflated expectations, bottom out in the trough of disillusionment, and start creeping up the slope of enlightenment. The next 12 months should be worsening unemployment, more rate cuts, higher AI stock prices. Once we bottom out in disillusionment (or companies run out of AI capital), the stock will plummet again, AI companies will die off, unemployment reduces, and rates return to previous levels.
That's the "exists-in-a-vacuum" picture anyway. Stimulus, tariffs, a new war, or some other bullshit will change the results.
They say "these results are completely general for any probability distribution with zero mean and a finite covariance matrix with rank much larger than the number of steps". It's not clear to me if that condition implies the number of steps is much lower than the dimensions of the random walk space or perhaps the probability distribution needs to be concentrated into a smaller number of dimensions to begin with? In which case the results is much less shocking.
The condition is the former. The probability distribution spans the full dimensionality of the space. Basically, the result will hold for an infinite number of dimensions and a finite number of steps. But it will also hold if you take both the number of steps and the dimensionality to infinity while holding the ratio N_steps / D constant with N_steps / D << 1.
That's the thing though - they're using logs. My theory is that LLMs are intrinsically quite good at that because they're good at sifting text.
Getting then to drive something like a debugger interface seems harder from my experience (although the ChatDBG people showed some success - my experiments did too, but it took the tweaks I described).
My experiments are with Claude Opus 4, in Claude Code, primarily.
He is not using appropriate models for this conclusion and neither is he using state of the art models in this research and moreover he doesn't have an expensive foundational model to build upon for 2d games. It's just a fun project.
A serious attempt at video/vision would involve some probabilistic latent space that can be noised in ways that make sense for games in general. I think veo3 proves that ai can generalize 2d and even 3d games, generating a video under prompt constraints is basically playing a game. I think you could prompt veo3 to play any game for a few seconds and it will generally make sense even though it is not fine tuned.
Veo3's world model is still pretty limited. That becomes obvious very fast once you prompt out of distribution video content (i.e. stuff that you are unlikely to find on youtube). It's extremely good at creating photorealistic surfaces and lighting. It even has some reasonably solid understanding of fluid dynamics for simulating water. But for complex human behaviour (in particular certain motions) it simply lacks the training data. Although that's not really a fault of the model and I'm pretty sure there will be a way to overcome this as well. Maybe some kind of physics based simulation as supplement training data.
What is the basis for it having a reasonable understanding of fluid dynamics? Why don’t you think it’s just regurgitating some water scenes derived from its training data, rather than generating actual fluid dynamics?
But even if you knew nothing about this topic, the observation that you simply couldn't store the necessary amount of video data in a model such that it could simply regurgitate it should give you a big clue as to what is happening.
To confirm: hybrid approaches can demonstrate competence at newly-created video games within a short period of exposure, so long as similar game mechanics from other games were incorporated into their training set?
> I think veo3 proves that ai can generalize 2d and even 3d games, generating a video under prompt constraints is basically playing a game.
In the same way that keeping a dream journal is basically doing investigative journalism, or talking to yourself is equivalent to making new friends, maybe.
The difference is that while they may both produce similar, "plausible" output, one does so as a result of processes that exist in relation to an external reality.
> I think veo3 proves that ai can generalize 2d and even 3d games
It doesn't. And you said it yourself:
> generating a video under prompt constraints is basically playing a game.
No. It's neither generating a game (that people can play) nor is it playing a game (it's generating a video).
Since it's not a model of the world in any sense of the word, there are issues with even the most basic object permanenece. E.g. here's veo3 generating a GTA-style video. Oh look, the car spins 360 and ends up on a completely different street than the one it was driving down previously: https://www.youtube.com/watch?v=ja2PVllZcsI
It is still doing a great job for a few frames, you could keep it more anchored to the state of the game if you prompt it. Much like you can prompt coding agents to keep a log of all decisions previously made. Permanenece is excellent, it slips often but it mostly because it is not grounded to specific game state by the prompt or by the decision log.
> generating a video under prompt constraints is basically playing a game
Besides static puzzles (like a maze or jigsaw) I don't believe this analogy holds? A model working with prompt constraints that aren't evolving or being added over the course of "navigating" the generation of the model's output means it needs to process 0 new information that it didn't come up with itself — playing a game is different from other generation because it's primarily about reacting to input you didn't know the precise timing/spatial details of, but can learn that they come within a known set of higher order rules. Obviously the more finite/deterministic/predictably probabilistic the video game's solution space, the more it can be inferred from the initial state, aka reduce to the same type of problem as generating a video from a prompt), which is why models are still able to play video games. But as GP pointed out, transfer function negative in such cases — the overarching rules are not predictable enough across disparate genres.
> I think you could prompt veo3 to play any game for a few seconds
I'm curious what your threshold for what constitutes "play any game" is in this claim? If I wrote a script that maps button combinations to average pixel color of a portion of the screen buffer, by what metric(s) would veo3 be "playing" the game more or better than that script "for a few seconds"?
It's not ideal, but you can prompt it with an image of a game frame, explain the objects and physics in text and let it generate a few frames of gameplay as a substitute for controller input as well as what it expects as an outcome. I am not talking about real interactive gameplay.
I am just saying we have proof that it can understand complex worlds and sets of rules, and then abide by them. It doesn't know how to use a controller and it doesn't know how to explore the game physics on its own, but those steps are much easier to implement based on how coding agents are able to iterate and explore solutions.
fair, and I edited my choice of words, but if you're reading that much aggression from my initial comment (which contains topical discussion) to say what you did, you must find the internet a far more savage place than it really is :/
A company can figure out the premium and just average it out across the pax who book thru them. Further they could risk-manage no-shows or other bad behaviour based on ratings and feedback. It's just wasting everybody's time to go thru intermediaries.
yeah this was a surprising result. of course, bear in mind that testing an LLM on SQL generation is pretty nuanced, so take everything with a grain of salt :)