I prefer 1 hour/1 day/etc but yes, this is the only method that I’ve found to work. Be very clear what result you’re trying to produce, spec out the idea in detail, break down the spec into logical steps, use orders of magnitude to break down each step. There’s your estimate. If you can’t break it down enough to get into the 1 day/1 week range per step, you don’t actually have a plan and can’t produce a realistic estimate
Given they heavily used LLMs for this optimization, makes you wonder why they didn’t use them to just port the C library to rust entirely. I think the volume of library ports to more languages/the most performant languages is going to explode, especially given it’s a relatively deterministic effort so long as you have good tests and api contracts, etc
The underlying C library interacts directly with the postgres query parser (therefore, Postgres source). So unless you rewrite postgres in Rust, you wouldn't be able to do that.
Well then why didn’t they just get the LLM to rewrite all of Postgres too /s
I agree that LLMs will make clients/interfaces in every language combination much more common, but I wonder the impact it’ll have on these big software projects if more people stop learning C.
I’m very bought in to the idea that raw coding is now a solved problem with the current models and agentic harnesses. Let alone what’s coming in the near term.
That being said, I think we’re in a weird phase right now where people’s obvious mental health issues are appearing as “hyper productivity” due to the use of these tools to absolutely spam out code that isn’t necessarily broadly coherent but is locally impressive. I’m watching multiple people both publicly and privately clearly breaking down mentally because of the “power” AI is bestowing on them. Their wires are completely crossed when it comes to the value of outputs vs outcomes and they’re espousing generated nonsense as it’s thoughtful insight.
I'd agree, the code "isn’t necessarily broadly coherent but is locally impressive".
However, I've seen some totally successful, even award-winning, human-written projects where I could say the same.
Ages back, I heard a woodworking analogy:
LLM code is like MDF. Really useful for cheap furniture, massively cheaper than solid wood, but it would be a mistake to use it as a structural element in a house.
Now, I've never made anything more complex than furniture, so I don't know how well that fit the previous models let alone the current ones… but I've absolutely seen success coming out of bigger balls of mud than the balls of mud I got from letting Claude loose for a bit without oversight.
Still, just because you can get success even with sloppy code, doesn't mean I think this is true everywhere. It's not like the award was for industrial equipment or anything, the closest I've come to life-critical code is helping to find and schedule video calls with GPs.
This has also been an interesting social experiment in that we get to see what work people think is actually impressive vs trivial.
Folks who have spent years effectively snapping together other people’s APIs like LEGOs (and being well-compensated for it) are understandably blown away by the current state of AI. Compare that to someone writing embedded firmware for device microcontrollers, who would understandably be underwhelmed by the same.
The gap in reactions says more about the nature of the work than it does about the tools themselves.
>Compare that to someone writing embedded firmware for device microcontrollers, who would understandably be underwhelmed by the same.
One datum for you: I recently asked Claude to make a jerk-limited and jerk-derivative-limited motion planner and to use the existing trapezoidal planner as reference for fuzzy-testing various moves (to ensure total pulses sent was correct) and it totally worked. Only a few rounds of guidance to get it to where I wanted to commit it.
My comment above I hope wasn't read to mean "LLMs are only good at web dev." Only that there are different capability magnitudes.
I often do experiments where I will clone one of our private repos, revert a commit, trash the .git path, and then see if any of the models/agents can re-apply the commit after N iterations. I record the pass@k score and compare between model generations over time.
In one of those recent experiments, I saw gpt-oss-120b add API support to swap tx and rx IQ for digital spectral inversion at higher frequencies on our wireless devices. This is for a proprietary IC running a quantenna radio, the SDK of which is very likely not in-distribution. It was moderately impressive to me in part because just writing the IQ swap registers had a negative effect on performance, but the model found that swapping the order of the IQ imbalance coefficients fixed the performance degradation.
I wouldn't say this was the same level of "impressive" as what the hype demands, but I remain an enthusiastic user of AI tooling due to somewhat regular moments like that. Especially when it involves open weight models of a low-to-moderate param count. My original point though is that those moments are far more common in web dev than they are elsewhere currently.
EDIT: Forgot to add that the model also did some work that the original commit did not. It removed code paths that were clobbering the rx IQ swap register and instead changed it to explicitly initialize during baseband init so it would come up correct on boot.
Ah yes the magic is more developed for commonly documented cases than niche stuff, 100% sorry I misinterpreted your post to mean that they are not useful for embedded rather than less capable for embedded. Also, your stuff is way more deep than anything I am doing (motion planning stuff is pretty well discussed online literature).
This is not true. You can see people who are much older and built a lot of the "internet scale" equally excited about it, e.g: freebsd OG developers, Steve himself (who wrote gas town) etc.
In fact, I would say I've seen more people who are "OG Coders" excited (and in their >50s) then mid generation
I think you're shadow-boxing with a point I never made. I never said experienced devs are not or can not be excited about current AI capabilities.
Lots of experienced devs who work in more difficult domains are excited about AI. In fact, I am one of them (see one of my responses in this thread about gpt-oss being able to work on proprietary RF firmware in my company [1]).
But that in no way suggests that there isn't a gap in what impresses or surprises engineers across any set of domains. Antirez is probably one of the better, more reasoned examples of this.
The OED defines prejudice as a "preconceived opinion that is not based on reason or actual experience."
My day to day work involves: full stack web dev, distributed systems, embedded systems, and machine learning. In addition to using AI tooling for dev tasks, we also use agents in production for various workflows and we also train/finetune models (some LLMs, but also other types of neural networks for anomaly detection, fault localization, time series forecasting, etc). I am basing my original commentary in this thread on all of that cumulative experience.
It has been my observation over the last almost 30 years of being a professional SWE that full stack web dev has been much easier and simpler than the other domains I work in. And even further, I find that models are much better at that domain on average than the other domains, measured by pass@k scores on private evals representing each domain. Anecdotal experience also tends to match the evals.
This tracks with all the other information we have pertaining to benchmark saturation, the "we need harder evals" crowd has been ringing this bell for the last 8-12 months. Models are getting very good at the less complex tasks.
I don't believe it will remain that way forever, but at present its far more common to see someone one shot a full stack web app from a single prompt than something like kernel driver for a NIC. One class of devs is seeing a massive performance jump, another class is not.
I don't see how that can be perceived as prejudice, it just may be an opinion you don't agree with or an observation that doesn't match your own experience (both of which are totally valid and understandable).
If you give every idiot a worldwide heard voice, you will hear every idiot from the whole world. If you give every idiot a tool to make programs, you will see a lot of programs made by idiots.
Gas Town is ridiculous and I had to uninstall Beads after seeing it only confuse my agents, but he's not completely insane or a moron. There may be some kernels of good ideas inside of Gas Town which could be extracted out into a better system.
> Steve Yegge is not an idiot or a bad programmer.
I don't think he's an idiot, there are almost no actual idiots here on HN in my opinion and they don't write such articles or make systems like Steve Yegge. I'm only commenting about giving more tools to idiots. Even tools made by geniuses will give you idiotic results when used by actual idiots, but a lot of smart people want to lower barriers of entry so that idiots can use more tools. And there are a lot of idiots who were inactive just because they didn't have the tools. Famous quote from a famous Polish essayist/futurist Stanisław Lem: "I didn't know there are so many idiots in this world until I got internet".
Even if I looked past the overwrought, self-indulgent Mad Max LARP (and the poor judgment evidenced by the prioritization of world-building minutia while the basic architecture is imploding), the cost of finding those kernels in a monstrosity of this size negates any ROI. 189k lines in four weeks will inevitably surface interesting pattern combinations — that's not merit, that's sample size. You might as well search the Library of Babel; at least the patterns are guaranteed to exist there.
The other problem with that reasoning is that whatever patterns ARE interesting are more likely to be new to AI-assisted coding generally – meaning a cleaner system built for the same use case will surface them without the archaeological dig, just by virtue of its builder having the skill to design it (and crucially, being more interested in designing it than in creating AI drawings of polecats in steampunk-adjacent garb).
I'm also a bit curious about at which point you start considering someone an idiot when they keep making objectively idiotic moves – the whimsical Disneyfied presentation, the "please don't download this" false modesty while keeping the repo public, the inexplicable code growth all come from the same place. They're not separate quirks: they're the same inability to edit, the same need for immediate audience validation, the same substitution of volume and narrative for actual engineering discipline. Someone who thinks "Polecats" and "Guzzoline" are good names for production abstractions is not suddenly going to develop the editorial rigor to scrap a codebase and rebuild.
Which is why it's worth remembering that Yegge's one successful shipped project was Grok, an internal tool used by Google engineers, so Yegge seems to have bought his own hype, missing how much of that project's success was likely subsidized by its user base comprising people skilled enough to route around its limitations.
These days he seems to be building for developers in general, but critically might be missing that actual developers immediately clock the project's ineptitude + Yegge's immature, narcissistic prioritization and peace the fuck out. The end result of this is filtering for the self-described vibe-coder types, people already Dunning-Krugered enough to believe you can prompt your way into a complete system without knowing how to reason about that system in order to guide the AI.
Which, fittingly, is how you end up with users who can't even follow "please don't download this yet".
Well put. I can't help thinking of this every time I see the 854594th "agent coordination framework" in GitHub. They all look strangely similar, are obviously themselves vibe-coded, and make no real effort to present any type of evidence that they can help development in any way.
You no longer have to be very specific about syntax. There's now an AI that can translate your idea into whatever language you want.
Previously, if you had an idea of what the program needed to do, you needed to learn a new language. This is so hard that we use language itself as a metaphor: It's hard to learn a new language, only a few people can translate from French to English, for example. Likewise, few people can translate English to Fortran.
Now, you can just think about your program in English, and so long as you actually know what you want, you can get a Fortran program.
The issue is now what it was originally for senior programmers: to decide what to make, not how to make it.
The hard part of software development is equivalent to the hard part of engineering:
Anyone can draw a sketch of what a house should look like. But designing a house that is safe, conforms to building regulations, and which wouldn't be uncomfortable to live in (for example, poor choice of heat insulation for the local climate) is the stuff people train on. Not the sketching part.
It's the same for software development. All we've done is replace FORTRAN / Javascript / whatever with a subset of a natural language. But we still need to thoroughly understand the problem and describe it to the LLM. Plus the way we format these markdown prompts, you're basically still programming. Albeit in a less strict syntax and the "compiler" is non-deterministic.
This is why I get so mythed by comments about AI replacing programmers. That's not what's happening. Programming is just shifting to a language that looks more like Jira tickets than source code. And the orgs that think they can replace developers with AI (and I don't for one second believe many of the technology leaders think this, but some smaller orgs likely do) are heading for a very unpleasant realisation soon.
I will caveat this by saying: there are far too many naff developers out there that genuinely aren't any better than an LLM. And maybe what we need is more regulation around software development, just like there is in proper engineering professions.
> Anyone can draw a sketch of what a house should look like. But designing a house that is safe, conforms to building regulations, and which wouldn't be uncomfortable to live in...
And now we have AIs that can take your sketch on paper and add all these complex and technical things by themselves. That's the point.
> Programming is just shifting to a language that looks more like Jira tickets than source code.
Sure, but now I need to be fluent in prompt-lang and the underlying programming language if you want me to be confident in the output (and you probably do, right?)
No, you have to be fluent in the domain. That is ultimately where the program is acting. You can be confident it works if it passes domain level tests.
You save all the time that was wasted forcing the language into the shape you intended. A lot of trivial little things ate up time, until AI came along. The big things, well, you still need to understand them.
> You can be confident it works if it passes domain level tests.
This is generally true for things you run locally on your machine IF your domain isn't super heavy on external dependencies or data dependencies that cause edge cases and cause explosions in test cases. But again, easier to inspect/be sure of those things locally for single-player utilities.
Generally much less true for anything that touches the internet and deals with money and/or long-term persistent storage of other people's data. If you aren't fluent in that world you'll run software built on old versions of third party code with iterations to make further changes that have to be increasingly broad in scope against a set of test cases that is almost certainly not as creative as a real attacker.
Personally I would love to see stuff move back to local user machines vs the Google-et-al-owned online world. But I don't think "cheap freeware" was the missing ingredient that prevented the corporate consolidation. And so people/companies who want to play in that massively-online world (where the money is) are still going to have to know the broader technical domain of operating online services safely and securely, which touches deep into the code.
So I, personally, don't have to be confident in one-off or utility scripts for manual tasks or ops that I write, because I can be confident in the domain of their behavior since I'm intimately familiar with the surrounding systems. Saves me a TON of time. Time I can devote to the important-to-get-correct code. But what about the next generation? Not familiar with the surrounding systems, so not even aware of what the domains they need to know (or not know) in depth are? (Maybe they'll pay us a bunch of money to help clean up a mess, which is a classic post-just-build-shit-fast successful startup story.)
You can get some of the way writing prompts with very little effort. But you almost always hit problems after a while. And once you do, it feels almost impossible to recover without restarting from a new context. And that can sometimes be a painful step.
But with learning to write effective prompts will get you a lot further, a lot quicker and with less friction.
So there’s definitely an element of learning a “prompt-lang” to effective use of LLMs.
> Sure, but now I need to be fluent in prompt-lang and the underlying programming language if you want me to be confident in the output (and you probably do, right?)
Using a formal language makes the problem space unambiguous. That is just as much a benefit as it is a barrier to entry. Once you learn this formal language, the ability to read code and see the surface area of the problem is absolutely empowering. Using english to express this is an exercise in frustration (or, occasionally, genius—but genius is not necessary with the formal language).
Again, I don't think most people are prepared to articulate what behavior they want. Fortran (and any other formal language) used to force this, but now you just kind of jerk off on the keyboard or into the microphone and expect mind-reading.
Reactionarily? Sure. Maybe AI has some role to play there. Maybe you can ask the chatbot to modify settings.
I am no fan of chatbots. But i do have empathy for the people responsible for them when their users start complaining that programs don't do what they want, despite the chatbots delivering precisely the code demanded.
There is a lot of research on how words/language influences what we think, and even what we can observe, like the Sapir-Whorf hypothesis. If in a langauge there is one word for 2 different colors, speakers of it are unable to see the difference between the colors.
I have a suspicion that extensive use of LLMs can result in damage to your brain. That's why we are seeing so many mental health issues surfacing up, and we are getting a bunch of blog posts about "an agentic coding psychosis".
It could be that llms go from bicycles for the brain to smoking for the brain, once we figure out the long term effects of it.
> If in a langauge there is one word for 2 different colors, speakers of it are unable to see the difference between the colors.
That is quite untrue. It is true that people may be slightly slower or less accurate in distinguishing colors that are within a labeled category than those that cross a category boundary, but that's far from saying they can't perceive the difference at all. The latter would imply that, for instance, English speakers cannot distinguish shades of blue or green.
The point I was trying to make is that the way our brain works is deeply connected to language and words, including how fast and how accurate you perceive colors [0][1]. And interacting with an LLM could have unexpected side effects on it, because we were never before exposed to "statistically generated language" in such amounts.
> If in a langauge there is one word for 2 different colors, speakers of it are unable to see the difference between the colors.
Perhaps you mean to say that speakers are unable to name the difference between the colours?
I can easily see differences between (for example) different shades of red. But I can't name them other than "shade of red".
I do happen to subscribe to the Sapir-Whorf hypothesis, in the sense that I think the language you think in constrains your thoughts - but I don't think it is strong enough to prevent you from being able to see different colours.
> if you show them two colors and ask them if they are different, they will tell you no
The experiments I've seen seem to interrogate what the culture means by colour (versus shade, et cetera) more than what the person is seeing.
If you show me sky blue and Navy blue and ask me if they're the same colour, I'll say yes. If you ask someone in a different context if Russian violet and Midnight blue are the same colour, I could see them saying yes, too. That doesn't mean they literally can't see the difference. Just that their ontology maps the words blue and violet to sets of colours differently.
If you asked me if a fire engine and a ripe strawberry are the same color I would say yes. Obviously, they are both red. If you held them next to each other I would still be able to tell you they are obviously different shades of red. But in my head they are both mapped to the red "embedding". I imagine that's the exact same thing that happens to blue and green in cultures that don't have a word for green.
If on the other hand you work with colors a lot you develop a finer mapping. If your first instinct when asked for the name of that wall over there is to say it's sage instead of green, then you would never say that a strawberry and a fire engine have the same color. You might even question the validity of the question, since fire engines have all kinds of different colors (neon red being a trend lately)
> in my head they are both mapped to the red "embedding"
Sure. That's the point. These studies are a study of language per se. Not how language influences perception to a meanigful degree. Sapir-Whorf is a cool hypothesis. But it isn't true for humans.
(Out of curiosity, what is "embedding" doing that "word" does not?)
Word would imply that this only happens when I translate my thoughts to a chosen human language (or articulate thoughts in a language). I chose embedding because I think this happens much earlier in the pipeline: the information of the exact shade is discarded before the scene is committed to memory and before most conscious reasoning. I see this as something happening at the interface of the vision system, not the speech center.
Which is kind of Sapir-Whorf, just not the extreme version of "we literally can't see or reason about the difference", more "differences we don't care about get lost in processing". Which you can kind of conceptualize as the brain choosing a different encoding, or embedding space (even though obviously such a thing does not exist in the literal sense in our brains)
Edit: in a way, I would claim Sapir-Whorf is mistaking correlation for causation: it's not that the words we know are the reason for how we can think, it's that what differences we care about cause both the ways we think and the words we use
> the information of the exact shade is discarded before the scene is committed to memory and before most conscious reasoning
I'm curious if we have any evidence for this. A lot of visual processing happens in the retina. To my knowledge, the retina has no awareness of words. I'd also assume that the visual cortex comes before anything to do with language, though that's just an assumption.
> it's not that the words we know are the reason for how we can think, it's that what differences we care about cause both the ways we think and the words we use
This is fair. Though for something like colour, a far-older system in our brains than language, I'd be sceptical of the latter controlling the former.
The ability for us to look at a gradient of color and differentiate between shades even without distinct names for them seems to disprove this on its face.
Unless the question is literally the equivalent of someone showing you a swatch of crimson and a swatch of scarlet and being asked if both are red, in which case, well yeah sure.
Sort of, at least some degree of relativism exists though how much is debated. Would you ever talk about sea having the same color as wine? But that's exactly what Homer called it.
This is still quite clearly something different than being unable to see the different colors, though.
Their mental model, sure. The way they convey it to others, sure.
But you can easily distinguish between two colors side by side that are even closer in appearance than wine and the sea, even if you only know one name for them. We can differentiate between colors before we even know the words for them when we're young, too.
I’ve been using Agent OS (https://buildermethods.com/agent-os) with Claude code’s $100 plan and it is pretty darn great. I usually ideate with Claude and ChatGPT back and forth on an idea to get to a prd for the whole product/project idea, with a high level roadmap, then go through agent os to bootstrap the core artifacts and then it’s just a loop of shaping specs, break into tasks, implement, then I manually test it out.
Using the “standards” concept to produce skills in CC seems to help a lot. I’m currently working on a Mac SwiftUI app (a language I’ve never built anything in) and it’s progressing nicely, has good test coverage, and I haven’t looked at a line of code. I found a couple SwiftUI skills repos online, had Claude adapt them to the agent os approach, and then hit the ground running.
Also, it basically functions like the really popular Ralph wiggum concept everyone is raving about lately. Implementation basically just runs on its own with a bunch of parallel agents, sometimes I have to nudge the model after my smoke testing to clean up some stuff. But overall, it just works, and is immensely productive. And this agent os thing adds just enough structure to tame complexity and variability.
I highly recommend it and have no other connection to it. I have some thoughts on some enhancements to it I’ll probably issue a few PRs for or fork the repo and implement.
A lot of “traditional” industries still run on TUIs/even as400 systems, such as freight/trucking companies, banking, medical, travel (saber, etc). And many companies in those spaces who appear to have migrated away from those systems are likely just using a modern gui interface that still uses those old systems behind the scenes.
It’s generally because 1) they’re actually reliable 2) they contain custom business logic and rules that are not documented well/at all and hard to replicate 3) the people who set up all of these things have retired and no one is left who understands any of it
I feel like a useful tool someone should build now that LLMs are so capable, is some sort of automated walk through of a pull request, where it steps a reviewer through initially the overview and they why of the changes, then step by step through each change, with commentary along the way generated by the LLM (and hopefully reviewed by the pull requester for accuracy). Then the reviewer can leave comments along the way.
I’ve always found the cognitive overhead of going through a pull request challenging, seems like the paradigm can shift now with all of the new tooling available to all of us.
It seems like maybe you should just talk to your colleague who wrote the code. It would be far more efficient than having them create and review a whole review experience on top of the work that they did, and then have you try to understand a walkthrough document (that might have hallucinated explanations that the requester didn’t quite catch) plus the underlying code, plus still have to talk to them to ask clarifying questions anyway.
This sounds like you're making an argument against pull request descriptions in general. Like you just want to see code changes and have a meeting about it.
I didn’t say anything about pull request descriptions.
Most of the time, I imagine that people on a team know what each other is generally doing, and that everyone is broadly familiar with the codebase. So, if you really need a ground up walkthrough of a pull request, then that is a time to talk to your colleague, because they’ve either done something brilliant or something weird. That colleague has asked you to engage with their code by tagging you in the review. If you can’t ask a question like “Alice, what am I looking at here?”, then there’s a problem.
It also seems weird to me that you seem to need to set a meeting with someone to talk about a pull review. Do you not just regularly talk to people on your team about what you guys are doing? Like, you just randomly see pull request review notifications pop up and that’s your only interaction with your team? It’s not like we talk about every detail of everything, but if something came up in a pull request that I wasn’t sure about, it would just come up in conversation.
> I didn’t say anything about pull request descriptions.
But that's what we're talking about. This is what you criticized:
automated walk through of a pull request, where it steps a reviewer through initially the overview and they why of the changes, then step by step through each change, with commentary along the way generated by the LLM (and hopefully reviewed by the pull requester for accuracy)
This is effectively what a PR description should do. It should explain what changes are being proposed and why. And having comments alongside the code changes only enhances that description IMO.
> a ground up walkthrough
Nobody said anything about a ground up walkthrough. It's walking through the changes, not the entire codebase.
> It also seems weird to me that you seem to need to set a meeting with someone to talk about a pull review
Again, I never said anything about scheduling a meeting. An ad-hoc discussion is a meeting too.
And sure, if I have a question after reading the PR then I'll ask the author. But it's certainly not the first thing I want to do.
There’s no way a PR description should be expected to have a step by step description of what the change is doing, along with a commentary. That’s what I mean by “ground up”: explaining every line of code with its thinking is something that maybe you’d do to teach coding or something.
If a dev has to take time even to review and edit an LLM generated version of this for every pull review (and it will require time to do this so that it doesn’t waste the reviewer’s time with wild goose chases due to faulty interpretation), and you are then going to have to wade through that doc in addition to reading the code, you could save everyone a lot of time and just talk to each other when you have questions.
I'm not looking at it like a line by line explanation of the code. Think of it like commit messages, but better. Better because:
1. commit messages are usually not very informative and the context is implicit.
2. commit messages are coupled to time rather than to the final PR changes. The PR changes are what really matters to the reviewer, not a log of what the author did to get there (especially if things are changing back and forth).
Sure, and PR descriptions and comments are a way to do that (async).
But IME it helps to start with a good description of what the changes are rather than just having a pile of code changes dumped on you where you then have to reverse engineer the intent.
It seems draining to me to have to have to start from nothing and interrogate someone for every PR that comes in.
> You can also talk to people without scheduling a meeting
I’ll update this comment in the morning when I’m not tired and I actually have something on GitHub, but I’ve been vibe coding a rails engine (don’t worry, I know how to build things in rails, I’m just time poor with a job and kids) that is a sophisticated rss feed subscription engine. Has niceties like adaptive polling based on posting frequency, user managed auto scraping capabilities if the feed doesn’t provide the full content, and all sorts of other nice features. I have a couple other project ideas that are based on a foundation of ingesting rss content at scale so I’ve been meaning to build this for years, and AI coding tools finally make this possible because otherwise this would have taken me months
It wont'. For now. But this is a long game. Google has apparently reduced the amount of contributions to AOSP and it would not be surprising if they went fully closed source in the near future. That would be the end of all roms.
It works with everything that runs in a browser - as long as the dev app runs on localhost and the codebase is local as well, we can integrate with it!
reply