I found myself agreeing with quite a lot of this article.
I'm a pretty huge proponent for AI-assisted development, but I've never found those 10x claims convincing. I've estimated that LLMs make me 2-5x more productive on the parts of my job which involve typing code into a computer, which is itself a small portion of that I do as a software engineer.
That's not too far from this article's assumptions. From the article:
> I wouldn't be surprised to learn AI helps many engineers do certain tasks 20-50% faster, but the nature of software bottlenecks mean this doesn't translate to a 20% productivity increase and certainly not a 10x increase.
I think that's an under-estimation - I suspect engineers that really know how to use this stuff effectively will get more than a 0.2x increase - but I do think all of the other stuff involved in building software makes the 10x thing unrealistic in most cases.
Yeah. I just need to babysit it too much. Take copilot, it gives good suggestions and blows me away sometimes with a block of code which is exactly what I'd type. But actively letting it code (at least with gpt4.1 or gpt4o) just doesn't work well enough for me. Half of the time it doesn't even compile, and after fixing that it's just not really correctly working either. I'd expect it to work like a very junior programmer, but it works like a very drunk senior programmer that isn't listening to you very well at all.
Yes yes yes we're all aware that these are word predictors and don't actually know anything or reason. But these random dice are somehow able to give reasonably seemingly well-educated answers a majority of the time and the fact that these programs don't technically know anything isn't going to slow the train down any.
i just don't get why people say they don't reason. It's crazy talk. the kv cache is effectively a unidirectional turing machine so it should be possible to encode "reasoning" in there. and evidence shows that llms occasionally does some light reasoning. just because it's not great at it (hard to train for i suppose) doesn't mean it does it zero.
Would I be crazy to say that the difference between reasoning and computation is sentience? This is an impulse with no justification but it rings true to me.
Taking a pragmatic approach, I would say that if the AI accomplishes something that, for humans, requires reasoning, then we should say that the AI is reasoning. That way we can have rational discussions about what the AI can actually do, without diverting into endless discussions about philosophy.
Suppose A solves a problem and writes the solution down. B reads the answer and repeats it. Is B reasoning, when asked the same question? What about one that sounds similar?
The crux of the problem is "what is reasoning?" Of course it's easy enough to call the outputs "equivalent enough" and then use that to say the processes are therefore also "equivalent enough."
I'm not saying it's enough for the outputs to be "equivalent enough."
I am saying that if the outputs and inputs are equivalent, then that's enough to call it the same thing. It might be different internally, but that doesn't really matter for practical purposes.
In my experience PhD's are not 10x productive. Quite the opposite actually. Too much theory and not much practicality. The only two developers that my company has fired for (basically) incompetency were PhD's in Computer Science. They couldn't deliver practical real code.
"Ketamine has been found to increase dopaminergic neurotransmission in the brain"
This property is likely an important driver of ketamine abuse and it being rather strongly 'moreish', as well as the subjective experiences of strong expectation during a 'trip'. I.e. the tendency to develop redose loops approaching unconsciousness in a chase to 'get the message from the goddess' or whatever, which seems just out of reach (because it's actually a feeling of expectation and not actually a partially installed divine T3 rig).
The “multiple PhDs” thing is interesting. The point of a PhD is to master both a very specific subject and the research skills needed to advance the frontier of knowledge in that area. There’s also plenty of secondary issues, like figuring out the politics of academia and publishing enough to establish a reputation.
I don’t think models are doing that. They certainly can retrieve a huge amount of information that would otherwise only be available to specialists such as people with PhDs… but I’m not convinced the models have the same level of understanding as a human PhD.
It’s easy to test though- the models simply have to write and defend a dissertation!
Totally disagree. The current state of coding AIs is “a level 2 product manager who is a world class biker balancing on a unicycle trying to explain a concept in French to a Spanish genius who is only 4 years old.” I’m not going to explain what I mean, but if you’ve used Qwen Code you understand.
Qwen Code is really not representative of the state of the art though. With the right prompt I have no problem getting Claude to output me a complete codebase (e.g. a non trivial library interfacing with multiple hardware devices) with the specs I want, in modern c++ that builds, runs, has documentation and unit tests sourced from data sheets and manufacturer specs from the go
Assuming there aren't tricky concurrency issues and the documentation makes sense (you know what registers to set to configure and otherwise work the device,) device drivers are the easiest thing in the world to code.
There's the old trope that systems programmers are smarter than applications programmers but SWE-Bench puts the lie to that. Sure, SWE-Bench problems are all in the language of software, applications programmers take badly specified tickets in the language of product managers, testers and end users and have to turn that into the language of SWE-Bench to get things done. I am not that impressed with 65% performance on SWE-Bench because those are not the kind of tickets that I have to resolve at work, but rather at work if I want to use AI to help maintain a large codebase I need to break the work down into that kind of ticket.
> device drivers are the easiest thing in the world to code.
Except the documentation lies and in reality your vendor shipped you a part with timing that is slightly out of sync with what the doc says and after 3 months of debugging, including using an oscilloscope, you figure out WTF is going on. You report back to your supplier and after two weeks of them not saying any thing they finally reply that the timings you have reverse engineered are indeed the correct timings, sorry for any misunderstandings with the documentation.
As an application's engineer, my computer doesn't lie to me and memory generally stays at a value I set it to unless I did something really wrong.
Backend services are the easiest thing in the world to write, I am 90% sure that all the bullshit around infra is just artificial job security, and I say this as someone who primarily does backend work now days.
I'm not sure if this counts as systems or application engineering, but if you think your computer doesn't lie to you, try writing an nginx config. Those things aren't evaluated at /all/ the way they look like they are.
At no point have any of my nginx files ever flipped their own bits.
Are they a constant source of low level annoyance? Sure. But I've never had to look at a bus timing diagram to understand how to use one, nor worried about an nginx file being rotated 90 degrees and wired up wrong!
To some extent, for sure. The fact that electronics engineers that have picked up a bit of software write a large fraction of the world's device drivers does point to it not being the most challenging of software tasks, but on the other hand the real 'systems engineering' is writing the code that lets those engineers do so successfully, which I think is quite an impressive feat.
I was joking! Claude Code is still the best afaik, though I’d compare it more to “sending a 1440p HDR fax of your user story to a 4-armed mime whose mind is then read by a Aztec psychic who has taken just the right amount of NyQuil.”
Probably the saddest comment I've read all day. Crafting software line-by-line is the best part of programming (maybe when dealing with hardware devices you can instead rely on auto-generated code from the register/memory region descriptions).
How long would that be economically viable when a sufficient number of people can generate high-qualify code in 1/10th the time? (Obviously, it will always be possible as a hobby.)
> But actively letting it code (at least with gpt4.1 or gpt4o)
It's funny, Github Copilot puts these models in the 'bargin bin' (they are free in 'ask' mode, whereas the other models count against your monthly limit of premium requests) and it's pretty clear why, they seem downright nerfed. They're tolerable for basic questions but you wouldn't use them if price weren't a concern.
Brandwise, I don't think it does OpenAI any favors to have their models be priced as 'worthless' compared to the other models on premium request limits.
With something like Devin, where it integrates directly with your repo and generates documentation based on your project(s), it's much more productive to use as an agent. I can delegate like 4-5 small tasks that would normally take me a full day or two (or three) of context switching and mental preparation, and knock them out in less than a day because it did 50-80% of the work, leaving only a few fixes or small pivot for me to wrap them up.
This alone is where I get a lot of my value. Otherwise, I'm using Cursor to actively solve smaller problems in whatever files I'm currently focused on. Being able to refactor things with only a couple sentences is remarkably fast.
The more you know about your language's features (and their precise names), and about higher-level programming patterns, the better time you'll have with LLMs, because it matches up with real documentation and examples with more precision.
> Being able to refactor things with only a couple sentences is remarkably fast.
I'm curious, this is js/ts? Asking because depending on the lang, good old machine refactoring is either amazeballs (Java + IDE) or non-existent (Haskell).
I'm not js/ts so I don't know what the state of machine refactoring is in VS code ... But if it's as good as Java then "a couple of sentences" is quite slow compared to a keystroke or a quick dialog box with completion of symbol names.
I'm using TypeScript. In my case, these refactors are usually small and only spanning up to 5 files depending on how interdependent things are. The benefit with an Agent is it's ability to find and detect related side effects caused by the refactor (broken type-safety, broken translation strings, etc.) and renaming for related things, like an actual UI string if it's tied to the naming of what I'm working on, and my changes happened to include a rename.
It's not always right, but I find it helpful when it finds related changes that I should be making anyway, but may have overlooked.
Another example: selecting a block that I need to wrap (or unwrap) with tedious syntax, say I need to memoize a value with a React `useMemo` hook. I can select the value, open Quick Chat, type "memoize this", and within milliseconds it's correctly wrapped and saved me lots of fiddling on the keyboard. Scale this to hundreds of changes like these over a week, it adds up to valuable time-savings.
Even more powerful: selecting 5, 10, 20 separate values and typing: "memoize all of these" and watching it blast through each one in record time with pinpoint accuracy.
IntelliJ has keyboard shortcuts for all of these. I think how impressed you are by AI depends a lot on the quality of the tooling you were previously working with.
Work is. I actually don't have access to our billing, so I couldn't tell you exactly, but it depends on how many ACUs (Agent Compute Units) you've used.
We use a Team plan ($500 /mo), which includes 250 ACUs per month. Each bug or small task consumes anywhere between 1-3 ACUs, and fewer units are consumed if you're more precise with your prompt upfront. A larger prompt will usually use fewer ACUs because follow-up prompts cause Devin to run more checks to validate its work. Since it can run scripts, compilers, linters, etc. in its own VM -- all of that contributes to usage. It can also run E2E tests in a browser instance, and validate UI changes visually.
They recommend most tasks should stay under 5 ACUs before it becomes inefficient. I've managed to give it some fairly complex tasks while staying under that threshold.
>I'd expect it to work like a very junior programmer, but it works like a very drunk senior programmer that isn't listening to you very well at all.
Best analogy I've ever heard and it's completely accurate. Now, back to work debugging and finishing a vibe coded application I'm being paid to work on.
I think there are three factors to this: 1. What to code (longer, more specific prompts are better but take longer to write), and 2. How to code it (specify languages, libraries, APIs, etc.) And if you're trying to write code that uses a newer version of a library that works differently from what's most commonly documented, it's a long uphill battle of constantly reminding the LLM of the new changes.
If you're not specific enough, it will definitely spit out a half-baked pseudocode file where it expects you to fill in the rest. If you don't specify certain libraries, it'll use whatever is featured in the most blogspam. And if you're in an ecosystem that isn't publicly well-documented, it's near useless.
Two other observations I've found working with ChatGPT and Copilot:
First, until I can re-learn boundaries, they are a fiasco for work-life balance. It's way too easy to have a "hmm what if X" thought late at night or first thing in the morning, pop off a quick ticket from my phone, assign to Copilot, and then twenty minutes later I'm lying in bed reviewing a PR instead of having a shower, a proper breakfast, and fully entering into work headspace.
And on a similar thread, Copilot's willingness to tolerate infinite bikeshedding and refactoring is a hazard for actually getting stuff merged. Unlike a human colleague who loses patience after a round or two of review, Copilot is happy to keep changing things up and endlessly iterating on minutiae. Copilot code reviews are exhausting to read through because it's just so much text, so much back and forth, every little change with big explanations, acknowledgments, replies, etc.
I've found this with Claude Code too. It has nonstop energy (until you run out of tokens) and is always a little too eager to make random edits, which means it's somehow very tiring to use even though you're not doing anything.
But it is the most productive intern I've ever pair programmed with. The real ones hallucinate about as often too.
if I want to throw a shuriken abiding to some artificial, magic Magnus force like in the movie wanted, both chatGpt and Claude let me down, using pygame. what if I wanted c-level performance or if I wanted to use zig? burp.
It works like the average Microsoft employee, like some doped version of an orange wig wearer who gets votes because his daddys kept the population as dumb as it gets after the dotcom x Facebook era. in essence, the ones to be disappointed by are the Chan-Zuckerbergs of our time. there was a chance, but there also was what they were primed for
What does it really mean to know something or understand something. I think AI knows a great deal (associating facts with symbols), confabulates at times when it doesn't know (which is dishonestly called hallucination, implying a conscious agent misperceiving, which AI is not), and understands almost nothing.
The best way to think of chat bot "AI" is as the compendium of human intelligence as recorded in books and online media available to it. It is not intelligent at all on its own and its judgement can't be better than its human sources because it has no biological drive to sythesize and excel. Its best to think of AI as a librarian of human knowledge or an interactive Wikipedia which is designed to seem like an intelligent agent but is actually not.
One cannot learn everything from books and in any case many books contradict each other so every developer is a variation based on what they have read and experienced and thought along the way. How can that get summed up into one thing? It might not even be useful to do that.
I suspect that some researchers with a very different approach will come up with a neural network that learns and works more like a human in future though. Not the current LLMS but something with a much more efficient learning mechanism that doesn't require a nuclear power station to train.
What is baffling to me is how otherwise intelligent people don't really understand what human intelligence and learning are about. They are about a biological organism following its replication algorithm. Why should a computer program learn and work like a biological organism if it is in an entirely different environment with entirely different drives?
Intelligence is not some universal abstract thing acheivable after a certain computational threshold is reached. Rather its a quality of the behavior patterns of specific biological organisms following their drives.
...because so far only our attempts to copy nature have proven successful...in that we have judged the result "intelligent".
There's a long history in AI where neural nets were written off as useless (Minsky was the famous destroyer of the idea, I think) and yet in the end they blew away the alternatives completely.
We have something now that's useful in that it is able to glom a huge amount of knowledge but the cost of doing so it tremendous and therefore in many ways it's still ridiculously inferior to nature because it's only a partial copy.
A lot of science fiction has assumed that robots, for example, would automatically be superior to humans - but are robots self-repairing or self replicating? I was reading recently about how the reasons why many developers like python are the reasons why it can never be made fast. In other words you cannot have everything - all features come at a cost. We will probably have less human and more human AIs because they will offer us different trade offs.
To date, I've not been able to effectively use Copilot in any projects.
The suggestions were always unusably bad. The /fix were always obviously and straight up false unless it was a super silly issue.
Claude Code with Opus model on the other hand was mind-blowing to me and made me change my mind on almost everything wrt my opinion of LLMs for coding.
You still need to grow the skill of how to build the context and formulate the prompt, but the buildin execution loop is a complete game changer and I didn't realize that until I actually used it effectively on a toy project myself.
MCP in particular was another thing I always thought was massively over hyped, until I actually started to use some in the same toy project.
Frankly, the building blocks already exist at this point to make a vast majority of all jobs redundant (and I'm thinking about all grunt work office jobs, not coding in particular). The tooling still need to be created, so I'm not seeing a short term realization (<2 yrs), but medium term (5+yrs)?
You should expect most companies to let people go at staggering numbers, with only small amounts of highly skilled people left to administer the agents
> You should expect most companies to let people go at staggering numbers, with only small amounts of highly skilled people left to administer the agents
I don't buy that. The linked article makes a solid argument for why that's not likely to happen: agentic loop coding tools like Claude Code can speed up the "writing code and getting it working" piece, but the software development lifecycle has so much other work before you get to the "and now we let Claude Code go brrrrrrr" phase.
These are exactly the people that are going to stay, medium term.
Let's explore a fictional example that somewhat resembles my, and I suspect a lot of peoples current dayjob.
A Micro-Service architecture, each team administers 5-10 services and the whole application, which is once again only a small part of the platform as a whole is developed by maybe 100-200 devs. So something like ~200 micro services
The application architects are gonna be completely save in their jobs. And so are the lead devs in each team - at least from my perspective. Anyone else? I suspect MBAs in 5 yrs will not see their value anymore. That's gonna be the vast majority of all devs, that's likely going to cost 50% of the devs their jobs. And middle management will be slimmed down just as quickly, because you suddenly need a lot less managers.
Let’s extreme this further - why would the company exist in the first place? The customers of said company pay them because they don’t do the service themselves - but in the future when it’s laughably easy to vibe code anything your heart desires, their customers will just build the service themselves that they used to outsource!
tl;dr: in the future when vibe coding works 100% of the time, logically the only companies that will exist are the ones that have processes that AI can’t do, because all the other parts of the supply chain can all be done in-house
That scenario is a lot further out compared to what I was talking about.
It's conceivable that thats going to happen, eventually. but that'd likely require models a lot more advanced to what we have now.
The agent approach with lead devs administering and merging the code the agents made is feasible with today's models. The missing part is the tooling around the models and the development practices that that standardizes this workflow.
That's what I'd expect to take around 5 yrs to settle.
Thanks for this perspective, but I am a bit confused by some of your takes: you used "Claude Code with Opus model" in "the same toy project" with great success, which led you to conclude that this will "make a vast majority of all jobs redundant".
Toy project viability does not connect with making people redundant in the process (ever, really) — at least not for me. Care to elaborate where do you draw the optimism from?
I cannot use it on my production code base. I'm working for a company that requires the devs to code from virtual workplaces, which is a fancy term to say virtual machines running in the azure cloud. These are completely locked down and anything but copilot is forbidden from use, and enforced via firewall and process monitoring. I can still use sonnet 3.7 through that, but that's a far cry from my experience on my personal time with Claude Code.
I called it a toy project because I'm not earning money with it - hence it's a toy.
It does have medium complexity with roughly 100k loc though.
And I think I need to repeat myself, because you seem to read something into my comment that I didn't say: the building blocks exist doesn't mean that today's tooling is sufficient for this to play out, today.
I did not miss the time horizon: this is why I put a remark of "ever, really".
"Toy project" is usually used in a different context (demonstrate something without really doing something useful): yours sounds more like a "hobby project".
That's a good point. Ive actually implemented the same project over 20 times at this point.
At the heart is my hobby of reading web and light novels. I've been implementing various versions of a scraper and ePub reader for over 15 years now, ever since I started working as a programmer.
I've been reimplementing it over the years with the primary goal of growing my experiences/ability. In the beginning it was a plain Django app, but it grew from that to various languages such as elixir, Java (multiple times with different architecture approaches), native Android, JS/TS Frontend and sometimes backend - react, angular, trpc, svelte tanstack and more.
So I know exactly how to implement it, as I've give through a lot of version for the same functionality.
And the last version I implemented (tanstack) was in July, via Claude Code and got to feature parity (and more) within roughly 3 weeks.
And I might add: I'm not positive about this development either, whatsoever. I'm just expecting this to happen, to the detriment of our collective futures (as programmers)
> You should expect most companies to let people go at staggering numbers, with only small amounts of highly skilled people left to administer the agents
I'm gonna pivot to building bomb shelters maybe
Or stockpiling munitions to sell during the troubles
Maybe some kind of protest support saas. Molotov deliveries as a service, you still have to light them and throw them but I guarantee next day delivery and they will be ready to deploy into any data center you want to burn down
What Im trying to say is "companies letting people go in staggering numbers" is a societal failure state not an ideal
I find it so weird how many engineers seem positively giddy to get replaced by a chatbot that functionally cannot do the job. Ill help your molotovs as a service startup, free guillotine with every 6th order.
So what happens when someone calls in and the "AI" answers (because the receptionist has been fired and replaced by "AI"), and the caller asks to access some company record that should be private? Will the LLM always deny the request? Hint: no, not always.
There are so many flaws in your plan, I have no doubt that "AI" will ruin some companies that try to replace humans with a "tin can". LLMs are being inserted loosey-goosey into too many places by people that don't really understand the liability problems it creates. Because the LLM doesn't think, it doesn't have a job to protect, it doesn't have a family to feed. It can be gamed. It simply won't care.
The flaws in "AI" are already pretty obvious to anyone paying attention. It will only get more obvious the more LLMs get pushed into places they really do not belong.
The human receptionist can use critical thinking, and self preservation to prevent a bad outcome. The LLM can not. When a person causes a problem, they can be fired, and learn from the event. The LLM will not learn from it. And who is responsible then? The company providing the LLM? The more LLM use becomes pervasive, the taller the house of cards gets.
> until I actually started to use some in the same toy project
Thats the key right there. Try to use it in a project that handles PII, needs data to be exact, or has many dependencies/libraries and needs to not break for critical business functions.
(1) for my day job, it doesn't make me super productive with creation, but it does help with discovery, learning, getting myself unstuck, and writing tedious code.
(2) however, the biggest unlock is it makes working on side projects __immensely__ easier. Before AI I was always too tired to spend significant time on side projects. Now, I can see my ideas come to life (albeit with shittier code), with much less mental effort. I also get to improve my AI engineering skills without the constraint of deadlines, data privacy, tool constraints etc..
2 heavily resonates with me. Simon Wilson made the point early on that AI makes him more ambitious with his side projects, and I heavily agree. Suddenly lots of things that seemed more or less un-feasible are now not only do-able, but can actually meet or exceed your own assumptions for them.
Being able to sit down after a long way of work and ask an AI model to implement some bug or feature on something while you relax and _not_ type code is a major boon. It is able to immediately get context and be productive even when you are not.
Funny. This is exactly how I use it too. I love to make a ui change prompt and switch to the browser and watch hot reload incrementally make the changes I assume will happen.
> (1) for my day job, it doesn't make me super productive with creation, but it does help with discovery, learning, getting myself unstuck, and writing tedious code
I hear this take a lot but does it really make that much of an improvement over what we already had with search engines, online documentation and online Q&A sites?
It is the best version of fuzzy search I have ever seen: the ultimate "tip of my tongue" assistant. I can ask super vague things like "Hey, I remember seeing a tool that allows you to put actual code in your files to do codegen, what could it be?" and it instantly gives me a list of possible answers, including the thing I'm looking for: Cog.
I know that a whole bunch of people will respond with the exact set of words that will make it show up right away on Google, but that's not the point: I couldn't remember what language it used, or any other detail beyond what I wrote and that it had been shared on Hacker News at some point, and the first couple Google searches returned a million other similar but incorrect things. With an LLM I found it right away.
The training cutoff comes into play here a bit, but 95% of the time I'm fuzzy searching like that I'm happy with projects that have been around for a few years and hence are both more mature and happen to fall into the training data.
Me, typing into a search engine, a few years ago: "Postgres CTE tutorial"
Me, typing into any AI engine, in 2025: "Here is my schema and query; optimize the query using CTEs and anything else you think might improve performance and readability"
And nowadays if you type that into a search engine you may be overwhelmed with ads or articles of varying quality that you'll need to read and deeply understand to adapt to your use-case.
I didn't say that. When you're trying to get a job done, it's time consuming to sift through a long tutorial online because a big part of that time is spent determining whether its garbage and whether its solving the exact problem that you need to solve. IME the LLM helps with both of those problems.
Those things don't really help with getting unstuck, especially if the reason you are struck is that there tedious code that you anticipate writing and don't want to deal with.
Exactly. My two worst roadblocks are the beginning of a new feature when I procrastinate way too much (I'm a bit afraid of choosing a design/architecture and committing to it) and towards the end when I have to fix small regressions and write tests, and I procrastinate because I just don't want to. AI solved the second roadblock 100% of the time, and help with design decisions enough to be useful (Claude4 at least). The code in the middle is a plus, but tbh I often do it myself (unless it's frontend code).
> does it really make that much of an improvement over what we already had with search engines, online documentation and online Q&A sites?
This can't be a serious question? 5 minutes of testing will prove to you that it's not just better, it's a totally new paradigm. I'm relatively skeptical of AI as a general purpose tool, but in terms of learning and asking questions on well documented areas like programming language spec, APIs etc it's not even close. Google is dead to me in this use case.
Yes. It's so dramatically better it's not even funny. It's not that information doesn't exist out there, it's more that an LLM can give it to you in a few seconds and it's tailored to your specific situation. The second part is especially helpful if the internet answer is 95% correct but is missing something specific to you that ends up taking you 20 minutes to figure out.
Yeah I strongly disagree. I want to spend time figuring the things that important to me and my career. I could care less about the one regex I write every year. Especially when I've learned and forgotten the syntax more times than I can count.
You asked whether it's really better than "what we already had with search engines, online documentation and online Q&A sites".
How have you found it not to be significantly better for those purposes?
The "not good enough for you to trust" is a strange claim. No matter what source of info you use, outside of official documentation, you have to assess its quality and correctness. LLM output is no different.
In my experience a lot our "google engineers" now do both. We tend to preach that they go to the documentation first, since that will almost always lead to actual understanding of what they are working on. Eventually most of them pick up that habbit, and in my experience, they never really go back to being "google engineers" after that... Where the AI helps with this, is that it can search documentation rather well. We do a lot of work with Azure, and while the Microsoft documentation is certainly extensive, it can be rather hard to find exactly what you're looking for. LLM's can usually find a lot of related pages, and then you can figure out which are relevant easier than you can with google/ecosia/ddg. I've havent used kagi, so maybe that works better?
As far as writing "tedious" code goes, I think the AI agents are great. Where I have personally found a huge advantage is in keeping documentation up-to-date. I'm not sure if it's because I have ADHD or because my workload is basically enough for 3 people, but this is an area I struggle with. In the past, I've often let the code be it's own documentation, because that would be better than having out-dated/wrong documentation. With AI agents, I find that I can have good documentation that I don't need to worry about beyond approving in the keep/discard part of the AI agent. I also rarely write SQL, bicep, yaml configs and similar these days, because it's so easy to determine if the AI agent got it wrong. This requires you're an expert on infrastructure as code and SQL, but if you are, the AI agents are really fast. I think this is one of the areas where they 10x at times. I recently wrote an ingress for an ftp pod (don't ask), and writing all those ports for passive mode would've taken me a while. There are a lot of risk involved. If you can't spot errors or outdated functionality quickly, then I would highly recommend you don't do this. Bicep LLM output is often not up to date, and since the docs are excellent what I do in those situations is that I copy/paste what I need. Then I let the AI agent update things like parameters, which certainly isn't 10x but still faster than I can do it.
Similarily it's rather good at writing and maintaining automatic tests. I wouldn't recommend this unless you're working with actively dealing with corrupted states directly in your code. But we do fail-fast programming/Design by Contract so the tests are really just an extra precaution and compliance thing, meaning that they aren't as vital as they will be for more implicit ways of dealing with error handling.
I don't think AI's are good at helping you with learning or getting unstuck. I guess it depends on how you would normally deal with. If the alternative is "google programming" and I imagine it is sort of similar and probably more effective. It's probably also more dangerous. At least we've found that our engineers are more likely to trust the LLM than a medium article or a stackoverflow thread.
If my work involves doing a bit of tooling and improve the testing and documenting them, I find myself having much lesser resistance and I'm rather happy to give it off to an AI agent.
I haven't begun doing side projects or projects for self, yet. But I did go down the road of finding out what would be needed to do something I wished existed. It was much easier to explore and understand the components and I might have a decent chance at a prototype.
The alternative to this would have been to ask people around or formulate extensively researched questions for online forums, where I'd expect to get half cryptic answers (and a jibe at my ignorance every now and then) at a pace that I would take years before I had something ready.
I see the point for AI as a prototyping and brainstorming tool. But I doubt we are at a point where I would be comfortable pushing changes to a production environment without giving 3x the effort in reviewing. Since there's a chance of the system hallucinating, I have a genuine fear that it would seem accurate, but what it would do is something really really stupid.
#2 is the reason I keep paying for Claude Code Pro.
For 20 a month I can get my stupid tool and utility ideas from "it would be cool if I could..." to actual "works well enough for me" -tools in an evening - while I watch my shows at the same time.
After a day at work I don't have the energy to start digging through, say, OpenWeather's latest 3.0 API and its nuances and how I can refactor my old code to use the new API.
Claude did it in maybe one episode of What We Do in the Shadows :D I have a hook that makes my computer beep when Claude is done or pauses for a question, so I can get back, check what it did and poke it forward.
#2 I expect to wind up as a huge win professionally as well. It lowers the investment for creating an MVP or experimental/exploratory project from weeks to hours or days. That ability to try things that might have been judged too risky for a team previously will be pretty amazing.
I do also believe that those who are often looked at or referred to as 10x engineers will maybe only see a marginal productivity increase.
The smartest programmer I know is so impressive mainly for two reasons: first, he seems to have just an otherworldly memory and seems to kind of have absolutely every little feature and detail of the programming languages he uses memorized. Second, his real power is really in cognitive ability, or the ability to always quickly and creatively come up with the smartest and most efficient yet elegant and clean solution to any given problem. Of course somewhat opinionated but in a good way. Funnily he often wouldn't know the academic/common name for some algorithm he arrived at but it just happened to be what made sense to him and he arrived at it independently. Like a talented musician with perfect pitch who can't read notation or doesn't know theory yet is 10x more talented than someone who has studied it all.
When I pair program with him, it's evident that the current iteration of AI tools is not as quick or as sharp. You could arrive at similar solutions but you would have to iterate for a very long time. It would actually slow that person down significantly.
However, there is such a big spectrum of ability in this field that I could actually see this increasing for example my productivity by 10x. My background/profession is not in software engineering but when I do it in my free time the perfectionist tendencies make me work very slowly. So for me these AI tools are actually cool for generating the first crappy proof of concepts for my side projects/ideas, just to get something working quickly.
I like the quip that AI raises the floor not the ceiling. I think it helps the bottom 20% perform more like the middle 50% but doesn't do much for people at the top.
Maybe to get an impression that they'd be performing like them - but not actually performing.
It helps me being lazy because I have a rough expectation of what the outcome should be - and I can directly spot any corner cases or other issues the AI proposed solution has, and can either prompt it to fix that, or (more often) fix those parts myself.
The bottom 20% may not have enough skill to spot that, and they'll produce superficially working code that'll then break in interesting ways. If you're in an organization that tolerates copy and pasting from stack overflow that might be good enough - otherwise the result is not only useless, but as it provides the illusion of providing complete solution you're also closing the path of training junior developers.
Pretty much all AI attributed firings were doing just that: Get rid of the juniors. That'll catch up with us in a decade or so. I shouldn't complain, though - that's probably a nice earning boost just before retirement for me.
I randomly stumbled across Tekwetu who've made a pretty good step-by-step example of coding with Claude Code, using MCPs, etc.[1]. None of the upsell or gushing. It's a pretty simple app with a backend, with a slightly complicated storage mechanism.
I was watching to learn how other devs are using Claude Code, as my first attempt I pretty quickly ran into a huge mess and was specifically looking for how to debug better with MCP.
The most striking thing is she keeps on having to stop it doing really stupid things. She slightly glosses over those points a little bit by saying things like "I roughly know what this should look like, and that's not quite right" or "I know that's the old way of installing TailwindCSS, I'll just show you how to install Context7", etc.
But in each 10 minute episodes (which have time skips while CC thinks) it happens at least twice. She has to bring her senior dev skills in, and it's only due to her skill that she can spot the problem in seconds flat.
And after watching much of it, though I skipped a few episodes at the end, I'm pretty certain I could have coded the same app quicker than she did without agentic AI, just using the old chat window AIs to bash out the React boilerplate and help me quickly scan the documentation for getting offline. The initial estimate of 18 days the AI came up with in the plan phase would only hold truye if you had to do it "properly".
It's worth a watch if you're not doing agentic coding yet. There were points I was impressed with what she got it to do. The TDD section was quite impressive in many ways, though it immediately tried to cheat and she had to tell it to do it properly.
Since then I've also provided enough glue that it can interact with the Arch Linux installer in a VM (or actual hardware, via serial port) - with sometimes hilarious results, but at least some LLMS do manage to install Arch with some guidance:
Somewhat amusingly, some LLMs have a tendency to just go on with it (even when it fails), with rare hallucinations - while other directly start lying and only pretend they logged in.
maybe, but I find that it makes it much faster to do things that _I already know how to do_, and can only slowly, ploddingly get me to places that I don't already have a strong mental model for, as I have to discover mistakes the hard way
I've only used Copilot, but this is just about exactly right. (I've only used it for Python.)
If I'm writing a series of very similar test cases, it's great for spamming them out quickly, but I still need to make sure they're actually right. This is easier to spot errors because I didn't type them out.
It's also decent for writing various bits of boilerplate for list / dict comprehensions, log messages (although they're usually half wrong, but close enough to what I was thinking), time formatting, that kind of thing. All very standard stuff that I've done a million times but I may be a little rusty on. Basically StackOverflow question fodder.
But for anything complex and domain-specific, it's more wrong than it's right.
things backed by Claude Sonnet can get a little further out than Copilot can, and when it’s in agent mode _sometimes_ it will do things like read the library source code to understand the API, or google for the docs
but the principle is the same: if the human isn’t doing theory-building, then no one is
Exactly. I'm in a situation right now where I've inherited a bunch of systems without enough documentation, and nobody knows how some things work. Sure, we've got features to build - but one of the most important things I can possibly do is make sure someone knows how stuff works, and write it down.
I think its more effective at lowering the floor. The amount of people that can't code at all but can now slap something together makes it a huge step forward. Albeit one that mostly steps on a pile of dogshit after it hits any sort of production reality.
Its like Wordpress all over again but with people even less able to code. There's going to be vast amounts of opportunities for people to get into the industry via this route but its not going to be a very nice route for many of them. Lots of people who understand software even less than c-suite holding the purse-strings.
AI is strong in different places, and if it keeps on being strong in certain ways then people very soon won't be able to keep up. For example, extreme horizontal knowledge and the ability to digest new information almost instantly. That's not something anyone can do. We don't try to compete against computers on raw calculation, and soon we won't compete on this one either. We simply won't even think to compare.
People keep focusing on general intelligence style capabilities but that is the golden grail. The world could go through multiple revolutions before finding that golden grail, but even before then everything would have changed beyond recognition.
So write an integration over the API docs I just copy-pasted.
Thanks for the comment Simon! This is honestly the first one I've read where it feels like someone actually read the article. I'm totally open to the idea that some people, especially those working on the languages/tools that LLMs are good at, are indeed getting a 2x improvement in certain parts of their job.
Something I have realized about Hacked News is that most of the comments on any given article are from people who are responding to the headline without actually clicking through and reading it!
This is particularly true for headlines like this one which stand alone as statements.
Perhaps that's my fault for making the title almost clickbaity. My goal was to get people who felt anxious about AI turning them into dinosaurs not feel like they are missing some secret sauce, so hopefully the reach this is getting contributes that.
Again, appreciate your thoughts, I have a huge amount of respect for your work. I hope you have a good one!
I think even claims of 2-5x are highly suspect. It would imply that if your team is using AI then all else equal they accomplish 2-5 times as much in a quarter. I don't know about you but I'm certainly not seeing this and most people on my team use AI.
[And to those saying we're using it wrong... well I can't argue with something that's not falsifiable]
My company is all in on LLMs and honestly the improvement seems to be like 0.9x to 1.2x depending on the project. None of them are moving at break neck speed and many projects are just as bogged down by complexity as ever. Pretty big (3000+) company with a large mature codebase in multiple languages. For god knows how much money spent on it
Once a company gets to a certain size, they no longer optimise for product development, instead they optimise for risk mitigation. A lot of processes will be put in place with the sole purpose of slowing down code development.
10x sounds nice which is probably why it stuck, but it came from actual research which found the difference was larger than 10x - but also they were measuring between best and worst, not best and average as it's used nowadays.
All of this is hard to quantify. How much better than the average engineer is John Carmack, or Rob Pike or Linus? I consider myself average-ish and I don't think there's any world in which I could do what those guys did no matter how much time you gave me (especially without the hindsight knowledge of the creations). So I'd say they're all infinitely better than me.
I guess that makes Newton a 10x scientist. Really puts in perspective how utterly unrealistic it is to be looking to hire exclusively 10x programmers - the true 10x'ers are legends, not just regular devs who type a bit faster.
It would be more sensible if the "10x" moniker was dropped altogether, and we just went back to calling these people what they've always been: "geniuses". Then there might be more realistic expectations of only finding them among 1% of the population.
And how much better are they than your average engineer when plopped into a mediocre organization where they aren’t the political and technical top dog? I would guess they would all quit within a week.
Good engineers don't stay in mediocre organizations, mediocre ones do. Do you think these "top dogs" were at the top of their game from day one? They all learned, just like everyone else; talent just gave them a higher ceiling.
And they get promoted too. Multiple times I've seen people get promoted for decisions that doom the company years later. Considering all the various departments and people that go into supporting these net negative engineers, more people are net negative than they think.
It highly depends on the circumstances. In over 30 years in the industry I met 3 people that were many times more productive than everyone else around them, even more than 10 times. What does this translate to? Well, there are some extraordinary people around, very rare and you cannot count on finding some and, when you find them, it is almost impossible to retain them because management and HR never agree to pay them enough to stay around.
He doesn't believe there are hundreds of Fabrice Bellard clones who think working at your company wouldn't be a waste of their time. The myth might be that thinking about 10X is useful in any sense. You can't plan around one gracing you with their presence and you won't be able to retain them when they do.
Thinking about it personally, a 10X label means I'm supposedly the smartest person in the room and that I'm earning 1/10th what I should be. Both of those are huge negatives.
I’ve found I do get small bursts of 10x productivity when trying to prototype an idea - much of the research on frameworks and such just goes away. Of course that’s usually followed by struggling to make a seemingly small change for an hour or two. It seems like the 10x number is just classic engineers underestimating tasks - making estimates based on peak productivity that never materializes.
I have found for myself it helps motivate me, resulting in net productivity gain from that alone. Even when it generates bad ideas, it can get me out of a rut and give me a bias towards action. It also keeps me from procrastinating on icky legacy codebases.
> engineers that really know how to use this stuff effectively
I guess this is still the "caveat" that can keep the hype hopes going. But I've found at a team velocity level, with our teams, where everyone is actively using agentic coding like Claude Code on the daily, we actually didn't see an increase in team velocity yet.
I'm curious to hear anecdotal from other teams, has your team seen velocity increase since it adopted agentic AI?
Same here. I have a colleague that is completely enamored with these agents. Uses them for everything he can, not just coding. Commit messages, opening PRs, Linear tickets, etc. Basically, he uses agents for everything he can. But the productivity gain is just not there. He's about as fast or rather as slow as he was before. And to a degree I think this goes for the whole team. It's the oxymoron of AI: more code, more documentation, more text, more of everything generated than ever, but the effect is that this means more complexity, more PRs to review, more bugs, more stuff to know and understand, ... We are all still learning how to use these agents effectively. And the particular developer's effect can and does multiply as everything else with GenAI. Was he a bit sloppy before, not covering various edge-cases and used quick-and-dirty shortcuts? Then this remains true for the code he produces using agents. And to those, who claim that "by using more agents I will gain 10x productivity" I say please read a certain book about how just adding developers to a project makes it even more delayed. The resemblance of team/project leadership -> developers dynamic is truly uncanny.
I agree. I'm a big fan/proponent of AI assisted development (though nowhere near your amount of experience with it). And I think that 2x-10x speed up can be true, depending on what you mean exactly and what your task is exactly.
This article thinks that most people who say 10x productivity are claiming 10x speedup on end-to-end delivering features. If that's indeed what someone is saying, they're most of the time quite simply wrong (or lying).
But I think some people (like me) aren't claiming that. Of course the end to end product process includes a lot more work than just the pure coding aspect, and indeed none of those other parts are getting a 10x speedup right now.
That said, there are a few cases where this 10x end-to-end is possible. E.g. when working alone, especially on new things but not only - you're skipping a lot of this overhead. That's why smaller teams, even solo teams, are suddenly super interesting - because they are getting a bigger speedup comparatively speaking, and possibly enough of one to be able to rival larger teams.
Programmers are notoriously bad about making estimates. Sure it sped something up 10x, but did you consider those 10 tries using AI that didn't pan out? You're not even breaking even, you are losing time.
My experience with GenAI is that it's a significant improvement to Stack Overflow, and generally as capable as someone hired right out of college.
If I'm using it to remember the syntax or library for something I used to know how to do, it's great.
If I'm using it to explore something I haven't done before, it makes me faster, but sometimes it lies to me. Which was also true of Stack Overflow.
But when I ask it to so something fairly complex on it's own, it usually tips over. I've tried a bunch of tests with a bunch of models, and it never quite gets it right. Sometimes it's minor stuff that I can fix if I bang on it long enough, and sometimes it's a steaming pile that I end up tossing in the garbage.
For example, I've asked it to code me a web-based calculator, or a 3D model of the solar system using WebGL, and none of the models I've tried have been able to do either.
I wonder if a better metric would be developer happiness? Instead of being 2x or 5x more productive, what if we looked at what a developer enjoyed doing and figured out how to use AI for everything else?
> I've estimated that LLMs make me 2-5x more productive on the parts of my job which involve typing code into a computer, which is itself a small portion of that I do as a software engineer.
I think that the key realization is that there are tasks where LLMs excel and might even buy you 10x productivity, whereas some tasks their contribution might even be net negative.
LLM are largely excellent at writing and refactoring unit tests, mainly because their context is very limited (i.e., write a method in a class that calls this specific method of this specific class a specific way and check the output) and their output is very repetitive (i.e., write isolated methods in standalone classes without output that are not called anywhere). They also seem helpful when prompted to add logging. LLMs are also effective in creating greenfield projects, serving as glorified template engines. But when lightly pressed on specific tasks like implementing a cross-domain feature... Their output starts to be at best a big ball of mud.
What will happen is over time this will become the new baseline for developing software.
It will mean we can deliver software faster. Maybe more so than other advances, but it won't fundamentally change the fact that software takes real effort and that effort will not go away, since that effort is much more than just coding this or that function.
I could create a huge list of things that have made developing and deploying quality software easier: linters, static type checkers, code formatters, hot reload, intelligent code completion, distributed version control (i.e., Git), unit testing frameworks, inference schema tools, code from schema, etc. I'm sure others can add dozens of items to that list. And yet there seems to be an unending amount of software to be built, limited only by the people available to build it and an organizations funding to hire those people.
In my personal work, I've found AI-assisted development to make me faster (not sure I have a good estimate for how much faster.) What I've also found is that it makes it much easier to tackle novel problems within an existing solution base. And I believe this is likely to be a big part of the dev productivity gain.
Just an example, lets say we want to use the strangler pattern as part of our modernization approach for a legacy enterprise app that has seen better days. Unless you have some senior devs who are both experienced with that pattern AND experienced with your code base, it can take a lot of trial and error to figure out how to make it work. (As you said, most of our work isn't actually typing code.)
This is where an AI/LLM tool can go to work on understanding the code base and understanding the pattern to create a reference implementation approach and tests. That can save a team of devs many weeks of trial & error (and stress) not to mention guidance on where they will run into roadblocks deep into the code base.
And, in my opinion, this is where a huge portion of the AI-assisted dev savings will come from - not so much writing the code (although that's helpful) but helping devs get to the details of a solution much faster.
It's that googling has always gotten us to generic references and AI gets us those references fit for our solution.
The other thing is that I don't believe software developers actually "do their best" when writing the code itself, that is, optimize the speed of writing code. Nor do they need to; they know writing the code doesn't take up time, waiting for CI and a code review and that iteration cycle does.
And does an AI agent doing a code review actually reduce that time too? I have doubts. Caveat, I haven't seen it in practice yet.
If 10x could be believed, we're long enough into having AI-coding assist that any such company that had gone all in would be head and shoulders above their competitors by now.
And we're not seeing that at all. The companies whose software I use that did announce big AI initiatives 6 months ago, if they really had gotten 10x productivity gain, that'd be 60 months—5 years—worth of "productivity". And yet somehow all of their software has gotten worse.
> I've estimated that LLMs make me 2-5x more productive on the parts of my job which involve typing code into a computer, which is itself a small portion of that I do as a software engineer.
This feels exactly right and is what I’ve thought since this all began.
But it also makes me think maybe there are those that A.I. helps 10x, but more because that code input is actually a very large part of their job. Some coders aren’t doing much design or engineering, just assembly.
Yeah, I hadn't thought about that. If you really are a programmer who gets all of their work assigned to them as detailed specifications maybe you are seeing a 10x boost.
I don't think I've encountered programmer like that in my own career, but I guess they might exist somewhere!
Personally I've found it's very good at writing support tools / shell scripts. I mostly use it to parse the output of other tools that don't have machine-readable output yet.
Claude Code (which is apparently the best in general) isn't very good at reviewing existing large projects IME, because it doesn't want to load a lot of text into its context. If you ask it to review an existing project it'll search for keywords instead of just loading an entire file.
That and it really wants to please you, so if you imply you own a project it'll be a lot more positive than it may deserve.
I've basically come to the same 2x to 5x conclusion as you. Problem is that "5x productivity" is really only a small portion of my actual job.
The hardest part of my job is actually understanding the problem space and making sure we're applying the correct solution. Actual coding is probably about 30% of my job.
That means, I'm only looking at something like 30% productivity gain by being 5x as effective at coding.
The thing that I keep wondering about: If the coding part is 2-5x more productive for you, but the stuff around the coding doesn't change... at some point, it'll have to, right? The cost/benefit of a lot of practices (this article talks about code review, which is a big one) changes a lot if coding becomes significantly easier relative to other tasks.
Yes, absolutely. Code used to be more expensive to write, which meant that a lot of features weren't sensible to build - the incremental value they provided wasn't worth the implementation effort.
Now when I'm designing software there are all sorts of things where I'm much less likely to think "nah, that will take too long to type the code for".
Speed of shipping software and pace of writing code are different things. Shipping software like iOS has a <50% component of programming so Amdahl's law caps the end-to-end improvement rather low, assuming other parts of the process stay the same.
At first I thought becoming “10x” meant outputting 10x as much code.
Now that I’m using Claude more as an expensive rubber duck, I’m hoping that I spend more time defining the fundamentals correctly that will lead to a large improvement in outcomes in the long run.
It lets me try things I couldn't commit the time to in the past, like quickly cobbling together a keystroke macro. I can also put together the outline of a plan in a few minutes. So much more can be 'touched' upon usefully.
I completely agree. I saw the claims about 30% increase in dev productivity a while ago and thought how is that possible when most of my job consists of meetings, SARs, threat modeling, etc.
Looking forward to those 20x most productive days out of an LLMs. And what are those most productive days? The ones when you can simplify and delete hundreds of lines of code... :-)
I don't doubt that some people are mistaken or dishonest in their self-reports as the article asserts, but my personal experience at least is a firm counterexample.
I've been heavily leaning on AI for an engagement that would otherwise have been impossible for me to deliver to the same parameters and under the same constraints. Without AI, I simply wouldn't have been able to fit the project into my schedule, and would have turned it down. Instead, not only did I accept and fit it into my schedule, I was able to deliver on all stretch goals, put in much more polish and automated testing than originally planned, and accommodate a reasonable amount of scope creep. With AI, I'm now finding myself evaluating other projects to fit into my schedule going forward that I couldn't have considered otherwise.
I'm not going to specifically claim that I'm an "AI 10x engineer", because I don't have hard metrics to back that up, but I'd guesstimate that I've experienced a ballpark 10x speedup for the first 80% of the project and maybe 3 - 5x+ thereafter depending on the specific task. That being said, there was one instance where I realized halfway through typing a short prompt that it would have been faster to make those particular changes by hand, so I also understand where some people's skepticism is coming from if their impression is shaped by experiences like that.
I believe the discrepancy we're seeing across the industry is that prompt-based engineering and traditional software engineering are overlapping but distinct skill sets. Speaking for myself, prompt-based engineering has come naturally due to strong written communication skills (e.g. experience drafting/editing/reviewing legal docs), strong code review skills (e.g. participating in security audits), and otherwise being what I'd describe as a strong "jack of all trades, master of some" in software development across the stack. On the other hand, for example, I could easily see someone who's super 1337 at programming high-performance algorithms and mid at most everything else finding that AI insufficiently enhances their core competency while also being difficult to effectively manage for anything outside of that.
As to how I actually approach this:
* Gemini Pro is essentially my senior engineer. I use Gemini to perform codebase-wide analyses, write documentation, and prepare detailed sprint plans with granular todo lists. Particularly for early stages of the project or major new features, I'll spend a several hours at a time meta-prompting and meta-meta-prompting with Gemini just to get a collection of prompts, documents, and JSON todo lists that encapsulate all of my technical requirements and feedback loops. This is actually harder than manual programming because I don't get the "break" of performing out all the trivial and boilerplate parts of coding; my prompts here are much more information-dense than code.
* Claude Sonnet is my coding agent. For Gemini-assisted sprints, I'll fire Claude off with a series of pre-programmed prompts and let it run for hours overnight. For smaller things, I'll pair program with Claude directly and multitask while it codes, or if I really need a break I'll take breaks in between prompting.
* More recently, Grok 4 through the Grok chat service is my Stack Overflow. I can't rave enough about it. Asking it questions and/or pasting in code diffs for feedback gets incredible results. Sometimes I'll just act as a middleman pasting things back and forth between Grok and Claude/Gemini while multitasking on other things, and find that they've collaboratively resolved the issue. Occasionally, I've landed on the correct solution on my own within the 2 - 3 minutes it took for Grok to respond, but even then the second opinion was useful validation. o3 is good at this too, but Grok 4 has been on another level in my experience; its information is usually up to date, and its answers are usually either correct or at least on the right track.
* I've heard from other comments here (possibly from you, Simon, though I'm not sure) that o3 is great at calling out anti-patterns in Claude output, e.g. its obnoxious tendency to default to keeping old internal APIs and marking them as "legacy" or "for backwards compatibility" instead of just removing them and fixing the resulting build errors. I'll be giving this a shot during tech debt cleanup.
As you can see, my process is very different from vibe coding. Vibe coding is fine for prototyping, on for non-engineers with no other options, but it's not how I would advise anyone to build a serious product for critical use cases.
One neat thing I was able to do, with a couple days' notice, was add a script to generate a super polished product walkthrough slide deck with a total of like 80 pages of screenshots and captions covering different user stories, with each story having its own zoomed out overview of a diagram of thumbnails linking to the actual slides. It looked way better than any other product overview deck I've put together by hand in the past, with the bonus that we've regenerated it on demand any time an up-to-date deck showing the latest iteration of the product was needed. This honestly could be a pretty useful product in itself. Without AI, we would've been stuck putting together a much worse deck by hand, and it would've gotten stale immediately. (I've been in the position of having to give disclaimers about product materials being outdated when sharing them, and it's not fun.)
Anyway, I don't know if any of this will convince anyone to take my word for it, but hopefully some of my techniques can at least be helpful to someone. The only real metric I have to share offhand is that the project has over 4000 (largely non-trivial) commits made substantially solo across 2.5 months on a part-time schedule juggled with other commitments, two vacations, and time spent on aspects of the engagement other than development. I realize that's a bit vague, but I promise that it's a fairly complex project which I feel pretty confident I wouldn't have been capable of delivering in the same form on the same schedule without AI. The founders and other stakeholders have been extremely satisfied with the end result. I'd post it here for you all to judge, but unfortunately it's currently in a soft launch status that we don't want a lot of attention on just yet.
There was a YC video just a few months ago where a bunch of jergoffs sat in a circle and talked about engineers being 10 to 100x as effective as before. Im sure google will bring it up.
You asked who’s making these silly claims, I provided one example of YC partners doing it. Not sure who got triggered or what advertising you are talking about, but there you go.
What you showed me is the equivalent of somebody saying, “People are claiming my life would be better if I drank more Pepsi” and the underlying evidence turned out to be a Pepsi commercial.
I'm a pretty huge proponent for AI-assisted development, but I've never found those 10x claims convincing. I've estimated that LLMs make me 2-5x more productive on the parts of my job which involve typing code into a computer, which is itself a small portion of that I do as a software engineer.
That's not too far from this article's assumptions. From the article:
> I wouldn't be surprised to learn AI helps many engineers do certain tasks 20-50% faster, but the nature of software bottlenecks mean this doesn't translate to a 20% productivity increase and certainly not a 10x increase.
I think that's an under-estimation - I suspect engineers that really know how to use this stuff effectively will get more than a 0.2x increase - but I do think all of the other stuff involved in building software makes the 10x thing unrealistic in most cases.