More

vladimirralev · 2025-11-26T14:25:21 1764167121

Today's high-end LLMs can do a lot of unsupervised work. Debug iterations are at least junior level. Audio and visual output verification is still very week (i.e. to verify web page layout and component reactivity). Once the visual model is good enough to look at the screen pixels and understand, it will instantly replace junior devs. Currently if you have only text output all new LLMs can iterate flawlessly and solve problems on it. New backend dev from scratch is completely doable with vibe coding now, with some exceptions around race conditions and legacy code comprehension.

thunky · 2025-11-26T17:03:53 1764176633

> Once the visual model is good enough to look at the screen pixels and understand, it will instantly replace junior devs

Curious if you gave Antigravity a try yet? It auto-launches a browser and you can watch it move the mouse and click around. It's able to review what it sees and iterate or report success according to your specs. It takes screen recordings and saves them as an artifact for you to verify.

I only tried some simple things with it so far but it worked well.

vladimirralev · 2025-09-29T20:10:04 1759176604

Good time to remind everybody that the Fed, the US central bank, prices gold at $42.22 for some reason and nobody questions it ever. And all those dignified educated respectable board members and econ PhDs give speeches about science-based market economy, maintaining credibility and staying away from politics while caring about the wellbeing of households and families.

ajross · 2025-09-29T20:19:18 1759177158

That's wildly misunderstanding things. The fed doesn't trade, regulate, hold or sell gold. That price you're quoting is for a historical artifact called a Gold Certificate. These things have a value set by law (not by the fed) of 42 and 2/9th dollars. They don't sell them anymore, but if you happen to have one they're required to buy it from you (IIRC) at that rate.

As always, Wikipedia: https://en.wikipedia.org/wiki/Gold_certificate_(United_State...

vladimirralev · 2025-09-29T20:42:22 1759178542

It's a little bit broader than this. In the Fed there is the so called "statutory price for gold" and it's not limited to gold certificates. Any gold in the Fed would be priced at $42 by law. The fact that they don't technically own any gold and work around the issue only makes it so much more amusing. It only serves to tell people they can fix prices and make outrageous course-correction changes overnight and people will still argue "it's fine and it's legal" afterwards.

https://www.federalreserve.gov/data/intlsumm/current.htm

ajross · 2025-09-29T21:20:14 1759180814

Again, that's all just conspiracy nonsense. Yes, there are old laws. No, they don't effect macroeconomic policy and to claim they do is silly. This is of a piece with the Trillion Dollar Coin nonsense[1] being hawked in equally silly circles on the other side of the aisle.

Real world economic policy works by virtue of steady hands and rigorously applied norms, not goofball trickery around edge cases of ancient laws.

[1] The idea that the Treasury's statutory authority to mint coinage could be exploited to mint a single illiquid-by-virtue-of-size asset that could then be borrowed against without increasing the debt ceiling.

vladimirralev · 2025-09-29T21:34:43 1759181683

Yes you are making my point exactly, I just gave you a link to the official federal reserve website that values gold at $42 and you still refer to it as a conspiracy theory. Surely they could have fixed this by now if it was a typo or a conspiracy.

The fact they they kept the price fixed should serve as a reminder they can do this at any time. There is no conspiracy theory, they've literally done this and nobody challenged them and no laws have changed since then (thus the "ancient law" that is somehow still in effect). The sophisticated economists that study "real world economic policy works by virtue of steady hands and rigorously applied norms" can make another "ancient law" any time they want.

ajross · 2025-09-29T21:50:44 1759182644

Citing the completely-unused ancient law in support of a claim that the fed is somehow "fixing prices" or "making outrageous course corrections overnight" or that they can "make another law any time they want" is the conspiracy. They aren't doing that. And they never have. And you know it. Which is why you're making noise about ancient unused laws.

We're done, this will be my last comment.

glial · 2025-09-29T20:24:35 1759177475

> nobody questions it ever

Seems like anyone who buys or sells gold at any other price is questioning it.

vladimirralev · 2025-09-29T20:46:26 1759178786

Well, please show me any mainstream finance media that questions it. The Fed has a monthly press conference, I don't think they were asked even once in the last 10 years at least... Most of the buying now is from Asia supposedly.

vladimirralev · 2025-09-19T13:38:11 1758289091

I am fascinated your comment is buried down here with absolutely no discussion. Money printing is undoubtedly the biggest factor in this particular inflation spike, but something is quietly steering the narrative away from it almost everywhere.

While the question of alternative actions and outcomes is also valid, this is a literal 10-trillion dollar question that nobody in a leadership position wants to ask or answer.

vladimirralev · 2025-09-05T15:38:26 1757086706

If AI is the cause, it will only stimulate investment in more AI and accelerate the layoffs though. And because they don't have other tools, looks like this is exactly what they are going to do. Investment tends to concentrate massively at the festest-growing trend which this time is just replacing workers with AI.

0xbadcafebee · 2025-09-05T18:19:56 1757096396

Which will deepen the hole, until we crest over the AI peak of inflated expectations, bottom out in the trough of disillusionment, and start creeping up the slope of enlightenment. The next 12 months should be worsening unemployment, more rate cuts, higher AI stock prices. Once we bottom out in disillusionment (or companies run out of AI capital), the stock will plummet again, AI companies will die off, unemployment reduces, and rates return to previous levels.

That's the "exists-in-a-vacuum" picture anyway. Stimulus, tariffs, a new war, or some other bullshit will change the results.

vladimirralev · 2025-09-03T21:01:19 1756933279

They say "these results are completely general for any probability distribution with zero mean and a finite covariance matrix with rank much larger than the number of steps". It's not clear to me if that condition implies the number of steps is much lower than the dimensions of the random walk space or perhaps the probability distribution needs to be concentrated into a smaller number of dimensions to begin with? In which case the results is much less shocking.

antognini · 2025-09-03T21:24:55 1756934695

The condition is the former. The probability distribution spans the full dimensionality of the space. Basically, the result will hold for an infinite number of dimensions and a finite number of steps. But it will also hold if you take both the number of steps and the dimensionality to infinity while holding the ratio N_steps / D constant with N_steps / D << 1.

vladimirralev · 2025-08-28T12:17:01 1756383421

LOL this is unexpectedly well done.

vladimirralev · 2025-07-04T20:18:34 1751660314

I've seen both replit and cline agents iteratively debug hard problem with massive amount of log lines. They can do it already.

mark_undoio · 2025-07-04T22:31:19 1751668279

That's the thing though - they're using logs. My theory is that LLMs are intrinsically quite good at that because they're good at sifting text.

Getting then to drive something like a debugger interface seems harder from my experience (although the ChatDBG people showed some success - my experiments did too, but it took the tweaks I described).

My experiments are with Claude Opus 4, in Claude Code, primarily.

throwaway81523 · 2025-07-04T21:39:52 1751665192

Look also at Delta Debugging which didn't need an LLM.

vladimirralev · 2025-06-30T18:56:31 1751309791

He is not using appropriate models for this conclusion and neither is he using state of the art models in this research and moreover he doesn't have an expensive foundational model to build upon for 2d games. It's just a fun project.

A serious attempt at video/vision would involve some probabilistic latent space that can be noised in ways that make sense for games in general. I think veo3 proves that ai can generalize 2d and even 3d games, generating a video under prompt constraints is basically playing a game. I think you could prompt veo3 to play any game for a few seconds and it will generally make sense even though it is not fine tuned.

sigmoid10 · 2025-06-30T20:42:12 1751316132

Veo3's world model is still pretty limited. That becomes obvious very fast once you prompt out of distribution video content (i.e. stuff that you are unlikely to find on youtube). It's extremely good at creating photorealistic surfaces and lighting. It even has some reasonably solid understanding of fluid dynamics for simulating water. But for complex human behaviour (in particular certain motions) it simply lacks the training data. Although that's not really a fault of the model and I'm pretty sure there will be a way to overcome this as well. Maybe some kind of physics based simulation as supplement training data.

mym1990 · 2025-07-01T06:07:50 1751350070

What is the basis for it having a reasonable understanding of fluid dynamics? Why don’t you think it’s just regurgitating some water scenes derived from its training data, rather than generating actual fluid dynamics?

sigmoid10 · 2025-07-01T19:52:53 1751399573

Because it can actually extrapolate to unseen cases while maintaining realism.

mym1990 · 2025-07-02T10:15:03 1751451303

Ah yes, the classic “because it can” argument. I’ll take that to mean you don’t know what you’re talking about.

sigmoid10 · 2025-07-07T15:13:24 1751901204

It seems you are confusing this with a personal opinion. This is not my opinion. This is merely the consensus of current research.

See here for example:

[1] https://arxiv.org/pdf/2410.18072

[2] https://arxiv.org/pdf/2411.02914v1

[3] https://openai.com/index/video-generation-models-as-world-si...

But even if you knew nothing about this topic, the observation that you simply couldn't store the necessary amount of video data in a model such that it could simply regurgitate it should give you a big clue as to what is happening.

altairprime · 2025-06-30T21:08:57 1751317737

Is any model currently known to succeed in the scenario that Carmack’s inappropriate model failed?

outofpaper · 2025-06-30T22:01:30 1751320890

No monolithic models but us ng hybrid approaches we've been able to beet humans for some time now.

altairprime · 2025-06-30T23:09:12 1751324952

To confirm: hybrid approaches can demonstrate competence at newly-created video games within a short period of exposure, so long as similar game mechanics from other games were incorporated into their training set?

317070 · 2025-06-30T21:08:59 1751317739

What you're thinking of is much more like the Genie model from DeepMind [0]. That one is like Veo, but interactive (but not publically available)

[0] https://deepmind.google/discover/blog/genie-2-a-large-scale-...

Intralexical · 2025-07-01T01:33:28 1751333608

> I think veo3 proves that ai can generalize 2d and even 3d games, generating a video under prompt constraints is basically playing a game.

In the same way that keeping a dream journal is basically doing investigative journalism, or talking to yourself is equivalent to making new friends, maybe.

The difference is that while they may both produce similar, "plausible" output, one does so as a result of processes that exist in relation to an external reality.

troupo · 2025-06-30T21:22:36 1751318556

> I think veo3 proves that ai can generalize 2d and even 3d games

It doesn't. And you said it yourself:

> generating a video under prompt constraints is basically playing a game.

No. It's neither generating a game (that people can play) nor is it playing a game (it's generating a video).

Since it's not a model of the world in any sense of the word, there are issues with even the most basic object permanenece. E.g. here's veo3 generating a GTA-style video. Oh look, the car spins 360 and ends up on a completely different street than the one it was driving down previously: https://www.youtube.com/watch?v=ja2PVllZcsI

vladimirralev · 2025-06-30T21:41:21 1751319681

It is still doing a great job for a few frames, you could keep it more anchored to the state of the game if you prompt it. Much like you can prompt coding agents to keep a log of all decisions previously made. Permanenece is excellent, it slips often but it mostly because it is not grounded to specific game state by the prompt or by the decision log.

troupo · 2025-07-01T06:19:46 1751350786

So, "it generates a game" somehow "it's incapable of maintaining basic persistence without continuous prompting per frame".

Also, prompting doesn't work as you imply it does.

keerthiko · 2025-06-30T20:57:27 1751317047

> generating a video under prompt constraints is basically playing a game

Besides static puzzles (like a maze or jigsaw) I don't believe this analogy holds? A model working with prompt constraints that aren't evolving or being added over the course of "navigating" the generation of the model's output means it needs to process 0 new information that it didn't come up with itself — playing a game is different from other generation because it's primarily about reacting to input you didn't know the precise timing/spatial details of, but can learn that they come within a known set of higher order rules. Obviously the more finite/deterministic/predictably probabilistic the video game's solution space, the more it can be inferred from the initial state, aka reduce to the same type of problem as generating a video from a prompt), which is why models are still able to play video games. But as GP pointed out, transfer function negative in such cases — the overarching rules are not predictable enough across disparate genres.

> I think you could prompt veo3 to play any game for a few seconds

I'm curious what your threshold for what constitutes "play any game" is in this claim? If I wrote a script that maps button combinations to average pixel color of a portion of the screen buffer, by what metric(s) would veo3 be "playing" the game more or better than that script "for a few seconds"?

edit: removing knee-jerk reaction language

vladimirralev · 2025-06-30T21:32:27 1751319147

It's not ideal, but you can prompt it with an image of a game frame, explain the objects and physics in text and let it generate a few frames of gameplay as a substitute for controller input as well as what it expects as an outcome. I am not talking about real interactive gameplay.

I am just saying we have proof that it can understand complex worlds and sets of rules, and then abide by them. It doesn't know how to use a controller and it doesn't know how to explore the game physics on its own, but those steps are much easier to implement based on how coding agents are able to iterate and explore solutions.

hluska · 2025-06-30T21:15:33 1751318133

[flagged]

keerthiko · 2025-06-30T21:26:36 1751318796

fair, and I edited my choice of words, but if you're reading that much aggression from my initial comment (which contains topical discussion) to say what you did, you must find the internet a far more savage place than it really is :/

pshc · 2025-06-30T21:53:06 1751320386

I think we need a spatial/physics model handling movement and tactics watched over by a high level strategy model (maybe an LLM).

vladimirralev · 2025-05-29T21:25:45 1748553945

A company can figure out the premium and just average it out across the pax who book thru them. Further they could risk-manage no-shows or other bad behaviour based on ratings and feedback. It's just wasting everybody's time to go thru intermediaries.

vladimirralev · 2025-05-22T19:40:30 1747942830

It's placed at 10. Below claude-3.5-sonnet, GPT 4.1 and o3-mini.

_peregrine_ · 2025-05-22T19:49:14 1747943354

yeah this was a surprising result. of course, bear in mind that testing an LLM on SQL generation is pretty nuanced, so take everything with a grain of salt :)