sort of, except I think the future of llms will be to to have the llm try 5 separate attempts to create a fix in parallel, since llm time is cheaper than human time... and once you introduce this aspect into the workflow, you'll want to spin up multiple containers, and the benefits of the terminal aren't as strong anymore.
I feel like the better approach would be to throw away PRs when they're bad, edit your prompt, and then let the agent try again using the new prompt. Throwing lots of wasted compute at a problem seems like a luxury take on coding agents, as these agents can be really expensive.
So the process becomes: Read PR -> Find fundamental issues -> Update prompt to guide agent better -> Re-run agent.
Then your job becomes proof-reading and editing specification documents for changes, reviewing the result of the agent trying to implement that spec, and then iterating on it until it is good enough. This comes from the belief that better, more expensive, agents will usually produce better code than 5 cheaper agents running in parallel with some LLM judge to choose between or combine their outputs.
Who or what will review the 5 PRs (including their updates to automated tests)? If it's just yet another agent, do we need 5 of these reviews for each PR too?
In the end, you either concede control over 'details' and just trust the output or you spend the effort and validate results manually. Not saying either is bad.
If you can define your problem well then you can write tests up front. An ML person would call tests a "verifier". Verifiers let you pump compute into finding solutions.
I'm not sure we write good tests for this because we assume some kind of logic involved here. If you set a human to task to write a procedure to send a 'forgot password' email, I can be reasonably sure there's a limited number of things a human would do with the provided email address, because it takes time and effort to do more than you should.
However with an LLM I'm not so sure. So how will you write a test to validate this is done but also guarantee it doesn't add the email to a blacklist? A whitelist? A list of admin emails? Or the tens of other things you can do with an email within your system?
They probably won't. But it doesn't matter. Ultimately, we'll all end up doing manual labor, because that is the only thing we can do that the machines aren't already doing better than us, or about to be doing better than us. Such is the natural order of things.
By manual labor I specifically mean the kind where you have to mix precision with power, on the fly, in arbitrary terrain, where each task is effectively one-off. So not even making things - everything made at scale will be done in automated factories/workshops. Think constructing and maintaining those factories, in the "crawling down tight pipes with scewdriver in your teeth" sense.
And that's only mid-term; robotics may be lagging behind AI now, but it will eventually catch up.
As well, just because it pasts a test doesn't mean it doesn't do wonky, non-performant stuff. Or worse, side effects no one verified. Plenty often the LLM output will add new fields I didn't ask it to change as one example.
Tipping has lost its meaning and it is simply a money grab these days in many establishments, as your experience demonstrates. Like tipping for food to go.
I only tip when I sit down and good service is actually provided.
That's seems silly, it's not poisonous to talk about next token prediction if 90% of the training compute is still spent on training via next token prediction (as far as I am aware)
I don’t really think that it is. Evolution is a random search, training a neural network is done with a gradient. The former is dependent on rare (and unexpected) events occurring, the latter is expected to converge in proportion to the volume of compute.
why do you think evolution is a random search?
I thought evolutionary pressures, and the mechanisms like epigenetics make it something different than a random search.
Evolution is a highly parallel descent down the gradient. The gradient is provided by the environment (which includes lifeforms too), parallelism is achieved through reproduction, and descent is achieved through death.
The difference is that in machine learning the changes between iterations are themselves caused by the gradient, in evolution they are entirely random.
Evolution randomly generates changes and if they offer a breeding advantage they’ll become accepted. Machine learning directs the change towards a goal.
Machine learning is directed change, evolution is accepted change.
It's more efficient, but the end result is basically the same, especially considering that even if there's no noise in the optimization algorithm, there is still noise in the gradient information (consider some magical mechanism for adjusting behaviour of an animal after it's died before reproducing. There's going to be a lot of nudges one way or another for things like 'take a step to the right to dodge that boulder that fell on you').
There's still a loss function, it's just an implicit, natural one, instead of artificially imposed (at least, until humans started doing selective breeding). The comparison isn't nonsense, but it's also not obvious that it's tremendously helpful (what parts and features of an LLM are analagous to what evolution figured out with single-celled organisms compares to multicellular life? I don't know if there's actually a correspondance there)
PSA: Most modern gyms have "autobelay" devices that let you climb on your own without a partner. This makes gym climbing a super fun and accessible exercise anyone, even beginners, can do by just showing up to a gym at your convenience.
(If you're a beginner you should still take the 1 hour class first and you will have to pass a belay test. And yes, if you can make the schedule work out with a friend so can belay each other, that's even more fun)
You still need to be careful. I'm an avid climber. Most autobelay accidents happen because people don't clip in properly. However for me the auto belay cable broke after catching me. Resulted in five minor spinal fractures.
So from my experience I would say at least Google what are the common auto belay manufacturers and only use gyms that have them. True Blue and Perfect Decent are the only auto belays I will touch now.
My understanding is that our local climbing gym sees most of its non-bouldering accidents from people not clipping into autobelays before they start climbing.
They lower the grade by cca 1 level by pulling you up, at least till 6a/6b in french scale. In higher levels I can imagine they also interfere with careful balance and body weight shifting training you away from actual skills, thats why I never saw them on anything harder than maybe 7b and even there it was like 1 or 2 routes in whole gym.
But for easy grades and cca beginners, if you lack a good partner for whatever reason, they are great IMHO.
The pull of an autobelay is negligible, surely. The cable is a bit annoying perhaps but the real problem is that the wall is like near vertical, completely flat. Super uninspiring in my opinion.
Most climbing gyms put auto belays only on flat or slabby ‘beginniner’ areas of the walls because most people using auto belays can’t do much on harder stuff - and also it’s kind of convenient to have your partner ‘take’/hold you on steep stuff sometimes.
Having uncontrolled (but slow) descents onto people’s heads probably also doesn’t help.
I mean I can lift my entire bodyweight with just a couple of fingers, but that point aside, this isn't so strange. The bench presser stalls exactly when their muscles are just short overcoming gravity, any extra force--even a couple of fingers--will add upwards momentum. You're not often in this kind of stall condition when climbing, it is much more about leverage and transplanting force through the kinetic chain. Especially since we were discussing balance on typically lightly overhanging flat walls.
> How is AI going to make its own chips and energy?
Pay naive humans take care of those things while it has to, then disassemble the atoms in their human bodies into raw materials for robots/datacenters once that is no longer necessary
I suppose there is an equilibrium, where sites that penalize these types of crawlers will also get less traffic from people reading ai citations, so for many sites the upsides of allowing it will be greater than the downsides.
As someone working in aviation safety, this is heartbreaking and awful to watch. The efforts of CAST and ASIAS in reducing aviation safety accidents have been very successful, but of course we still have so much to do.