This is interesting to hear, but I don't understand how this workflow actually works.
I don't need 10 parallel agents making 50-100 PRs a week, I need 1 agent that successfully solves the most important problem.
I don't understand how you can generate requirements quicky enough to have 10 parallel agents chewing away at meaningful work. I don't understand how you can have any meaningful supervising role over 10 things at once given the limits of human working memory.
It's like someone is claiming they unlocked ultimate productivity by washing dishes, in parallel with doing laundry, and cleaning their house.
Likely I am missing something. This is just my gut reaction as someone who has definitely not mastered using agents. Would love to hear from anyone that has a similar workflow where there is high parallelism.
My initial response to reading this post was "wow, I think I'd rather just write the code".
I also remain a bit skeptical because, if all of this really worked (and I mean over a long time and scaling to meet a range of business requirements), even if it's not how I personally want to write code, shouldn't we be seeing a ton of 1 person startups?
I see Bay area startups pushing 996 and requiring living in the Bay area because of the importance of working in an office to reduce communication hurdles. But if I can really 10x my current productivity, I can get the power of a seed series startup with even less communication overhead (I could also get by with much less capital). Imagine being able to hire 10 reliable junior-mid engineers who unquestionably followed your instruction and didn't need to sleep. This is what I keep being told we have for $200/month. Forget not needing engineers, why do we need angel investors or even early stage VC? A single smart engineer should be able, if all the claims I'm hearing are true, to easily accomplish in months what used to take years.
But I keep seeing products shipped at the same speed but with a $200 per month per user overhead. Honestly I would love to be wrong on this because that would be incredibly cool. But unfortunately I'm not seeing it yet.
> shouldn't we be seeing a ton of 1 person startups?
Here's the dirty secret: 1 person AI coding enabled startups don't want their customers to know that they are 1 engineer AI coding startups so they do not expose it or share that info. There is still a lot of negative sentiment associated with this.
I know 3 such founders; none would advertise to their customers the extent of their AI usage. There is also a consideration that if they advertise their 1 eng status and success, it might attract other competitors or the customers might think they can do it themselves (maybe possible, but not for 95% of them since some tech know how is still required) or customers would see it as a business risk.
All 3 have blown me away with what they are doing. All 3 have real, paying customers. (They occasionally reach out for some higher order architecture questions)
As of the middle of the year, there was no increase in publicly available indicators of new startups at all [0]. No change in the trend in steam releases, domain name registrations, app store releases, etc. People might be able to keep the fact that they're a one person team that built the app with AI secret, but they wouldn't be able to keep the fact that they made an app secret. Unless someone has evidence that's changed dramatically in the last six months, I have to conclude that the reason we aren't seeing a wave of AI enabled SaaS startups isn't that they're keeping the fact that they're solo operations with AI a secret, but rather that no such wave actually exists.
Can't speak for anyone else, but I personally know 3.
2 of the 3 existed as entities for more than a year already, but pivoted at least once (both were VC-funded but now doing something very different than what they started with when I first met founders) and ultimately let go of their offshore and contract engineers once AI became good enough some time early last year. Founders basically realized that the quality of code was as good or better than what they were getting from their engineers while reducing the turnaround time; now they can go from talking to customers to having a working prototype in the same day instead of waiting 24h+ for an offshore team. The other one started in November of 2024 and found traction around March.
So two companies went from multi-person teams to 1 person teams and 1 team was a 1 person eng team from the get-go (with a business-oriented partner).
I'd also point out that 2025 was a particularly volatile year because of shifts in the political and economic environment (including very high interest rates) so I wouldn't take your stat at face value without considering external factors that might affect the total number of net new business registrations.
It still remains true that building a product is not the same thing as building a business. It may be that we'll see less SaaS startups as companies find that they can just in-house software instead of buying. Who knows? Startup I'm at canceled one of our subscriptions because we ended up building an in-house replacement because it is now cheap enough and easy enough that we could.
> Can't speak for anyone else, but I personally know 3.
I'm not saying your three friends/acquaintances don't exist, I'm saying the evidence suggests they aren't representative of a trend. This is consistent with the other evidence we have (e.g. studies which show that LLMs produce at best relatively modest gains in productivity, not enough for a one person team to do the work of even two people.
> I'd also point out that 2025 was a particularly volatile year because of shifts in the political and economic environment so I wouldn't take your stat at face value without considering external factors that might affect the total number of net new business registrations.
Sure, it's always possible that without LLMs there would have been a significant contraction in these metrics. The issue is exactly that though: you can always make that argument. In other words, you've rendered your claims unfalsifiable.
I'm not saying it's evidence for some larger trend; I'm presenting the reason why single-person teams might not advertise why they are single person and that these teams are not necessarily starting as single person teams, but sometimes collapsing down to single person teams.
Maybe you don't mean to, but when you present an anecdote you're implying evidence of some trend, otherwise it's just a pointless statement. And unless a multi-person team is collapsing down into multiple single person teams, there's no increase in productivity and we're actually in a worse position as a whole.
Except in context that was very much what was suggested. The implication of the comment I replied to is that there actually are "a ton of 1 person startups" (and by implication, that LLMs do enable the massive increases in productivity that their proponents like to claim), but that they just keep the fact that they are quiet.
This matches what I’ve been seeing as well. Small teams can move surprisingly fast now, but the bottleneck usually shifts from engineering to distribution and positioning.
We’ve found that building the product got easier, but turning it into a sustainable business still required just as much manual effort around sales, onboarding, and retention.
You're moving the goalposts; building the product never equaled writing some code, it's always involved all of the efforts you reference. The expectation is that you optimized the code generation and shifted the bottleneck, but are overall more productive (i.e. the cycle is shorter). If you're not iterating faster then there's be no productivity gain.
Those companies weren’t multiple person teams. They were one person teams with contract work. Maybe you know the details of the kind of money they were paid or how involved they were with the work but that could mean so many things.
I’d have to say when I hire someone in Fiverr to make a logo for my app I’m not suddenly a multi-person team. If I use AI to make my logo instead of paying a human $50 to make one I didn’t exactly experience a productivity revolution.
The other thought that popped into my head is that offshore contractors have access to AI, too. So shouldn’t we see their output go up and prices go down? Again we have another facet of this lack of market indicators.
Because it's better (versus the engineers they were able to hire).
> If I hire two sandwich artists for 6 months but nobody buys my sandwiches
Pivot is strong and in both cases where they went n -> 1, the pivots were dramatic. One went from building a (credit) card switching SDK to building a legal assistant AI. One went from building a fin-tech compliance product to a CRM for managing collections.
Because they went back to the drawing board, they ended up letting go of their teams and started using AI to build MVPs and then found that they could ship faster and better.
This now seems like even less useful information than before.
They literally changed to doing an entirely different business.
This would be like saying I hired two sandwich artists, but sandwiches don’t sell well, so I fired my sandwich artists and now I run a coffee shop on my own.
I don't think cheaper/easier software development can be the limiting success factor for many startups. Success is more about the skills and business aptitude of the founder(s), which is why VCs invest more in people than ideas, and don't seem to flinch when founders pivot to something completely different.
I could see AI coding leading to more attempted startups, and more people shipping initial products and attempting to get traction with them, but whether they do get traction and achieve PMF, and are able to actually grow it into a business is going to come down to the startup expertise of the founders, not how quickly/cheaply the code of the product was written.
I expect you see the world this way because you are a software developer. People who know how to sell and understand the problems to solve do not routinely understand how to build software to solve those problems so they can sell them to customers. Now that the bar for building software is lowering, the world of building a startup is changing. A relatively newcomer to software is able to ship a medium complexity vibe-coded app to a few test customers and kick off revenues.
I agree that the bar for building software has dropped significantly, but I think the harder part still shows up right after the first few customers.
Shipping something workable is easier now, but understanding which problems are actually worth solving — and getting consistent feedback early — still seems to be the main separator between hobby projects and real businesses.
I totally concur. That said, technology is evolving fast, and I think it's clear that the bar for solving those problems with non-technical people will drop dramatically in the next 12 months.
But eventually people will catch up you can basically create a working product alone with the help of AI.
My prediction is that this will lead to a margin free-fall for many software products where the main moat is the software itself. And a lot of SaaS companies will also become redundant when the AI can code up a tailored solution in an hour for free.
I think so too. But in the meantime there is a quiet goldrush for people who spot niches where they can extract decent (or a lot) of value right now, and for long enough to be worthwhile. If they can get scale enough that thinner margins makes for a worthwhile business when the market catches up, great. If they can't, then we stay lean we might make off with decent ROI.
But that is also a reason to be cautious of chasing capital and think hard about whether you can spend it sensibly fast enough to improve your own ROI...
E.g. I have a project right now where I won't consider taking VC cash because I don't think I can spend it fast enough to buy me enough additional leverage to make enough additional money to compensate for the dilution and the other usual shenanigans before I expect margins will be squeezed out of the niche in question. It also means I don't think the opportunity will ever scale above a certain level, but that's fine - it'll be a quick attempt at grabbing what profit I can.
Also, while we of course shouldn't diminish the potential moat created by understanding the product in favour of only value the tech, we need to also consider that AI's are a levelling factor there too. Claude knows (I've verified what it's said) more about the niche I'm vaguely talking about than I do - it knows pricing, it knows positioning/marketing, it knows conventions and requirements of the niche, and while I'm sure I could have found all of it myself starting from scratch too it shortcircuited an enormous amount of effort to get an infodump that let me know precisely what to look for to verify it. A lot of tech companies will find the institutional knowledge they thought would shore up their moat is worth a lot less than they thought.
> A lot of tech companies will find the institutional knowledge they thought would shore up their moat is worth a lot less than they thought.
I totally agree. I think going forward the primary value of SAS will be the embedded domain expertise in a pre-built product. The comparison of Asana versus Notion comes to mind for project management. Asana forces abstractions of good project management upon you, whereas Notion lets you build it yourself. I think this principle will scale to all software in the future, where the only real value of software or it becomes exported maintenance obligations and a predetermines domain abstraction.
But as you mentioned, I think companies will rapidly find that their own specific abstraction is worth a lot less than they believed.
SaaS is extremely vulnerable, companies will be able to modify open source tools to do exactly what they need, and agents will make managing those services easier. This will lead to downward pressure on SaaS prices, and cause them to become more like cloud data management platforms that they let customers build on top of rather than one-size-fits-all apps.
I agree with this completely. I forsee an era of enterprise level 'template' saas products that are expected to be tinkered with and highly customized. I think products like Notion that have an incredibly robust customizability and integration layer are going to thrive, where every single company can use a template engine to build extremely customized applications - and the barrier to building on top of these will essentially become the rate of human speech.
I predict that the commercial market for a lot of software will evaporate as people find that getting AI to whip up a custom solution that fits their unique problem space like a glove is actually cheaper and simpler than trying to make COTS software do the job. We're not quite here yet, but maybe in a few years.
Yes/no. Regardless of the code complexity reduction there is still architecture, planning and implementation. Could someone come by and clone my work afterwards? Absolutely. Will they retain customers with only a little understanding of the product or model? Questionable.
Sure, but there's a whole lot of businesses already using custom solutions made with excel/access/etc that are held together with duct tape and chicken wire, so I think the adventurous spirit necessary is there.
There have always been hundreds or thousands of companies that want software engineers but simply don't have the revenue to support them. My first dev job was a small private company in exactly this spot. They basically paid me my salary for six months to figure out WordPress and PHP on the job having only ever done very basic programming stuff on my own in high school ~6 years prior.
The median dev salary across the entire US is something like $130k/yr. There are huge numbers of new or self-taught software devs in low cost of living areas of the country making $50-60k/yr.
In the same pattern there are a lot of businesses where these solutions are not efficient and they MOVED from them to expensive commercial software. It's actually an antipattern to build a bunch of in-house, Excel-based solutions - with AI or not - for these companies.
You are discounting sales, marketing, and branding. Take drop shipping for example: anyone can do this, but the successful ones are those that know how to brand and market the product well.
Not to mention having the right mindset for startups and building a business.
The code and product is maybe only 20% of the story.
I'm not. That edge eventually converges to 0 when you have 10+ competitors that offer the same for 10x less money.
If you don't have some kind of cult following like Apple eventually you'll get margin-squeezed till death and all that marketing, sales, etc. will get cut down to stay afloat.
Of course all of the above is just my theory how this will play out in the long run, I'm no oracle by any means.
Not likely because there is still a lower bound. These 1 person startups are winning partially because they are already 10x cheaper than the incumbents.
But beyond that, it's not likely that there are 10x the number of people who know the domain and have the right mindset plus appetite for risk.
I'm not entirely disagreeing. There are limits there that means we can't assume the margins will go to their theoretical minimum. But you're also in part assuming the models don't increasingly know the domain or know how to research the domain and compile the information for you.
They'll be squeezing margins out of a lot more than just the tech.
Discounting Apple, their products and their customers to a cult is at best jealousy but still blatantly wrong. Lots of competition has been trying to out-Apple them for decades with no luck, and it's not because an iPhone customer is stupid & brain-washed.
Perhaps for extremely basic products. Most non-engineers can barely write and untangle their messy thoughts and you think they can just build a spec for an AI to build a product? Hopefully I'm wrong, but I doubt it.
This is what gets me... Even at companies with relatively small engineering teams compared to company size, actually getting coherent requirements and buy-in from every stakeholder on a single direction was enough work that we didn't really struggle with getting things done.
Sure, there was some lead, but not nearly enough to 2x the team's productivity, let alone 10x.
Even when presented with something, there was still lead time turning that into something actually actionable as edge cases were sussed out.
I am one of those founders who does not want their customers to know. I have one specific very large customer that is quite an old school company. My software has become pretty pivotal for some of their workflows and if they knew it was one guy on his laptop keeping things afloat with the help of a mysterious AI I am pretty sure they'd reconsider our contract.
Most startup -> enterprise deals are like this in nature. Enterprise buyers are already wary of small startups (for various reasons). A 1 person startup? Wouldn't even get a meeting with the buyers in many cases even if your software was 10x cheaper and exactly solved the business problem.
I worked for a public health care Enterprise early in my career and I make a joke to one of the VPs once about how it seemed like the real career success would be finding one of our pain points as a patient or employee, leaving to start a company that solves that, and selling it back to us. He laughed and said several people had done that but you better take a half dozen executives with you or you'd never get the first meeting no matter how good the product was.
> you better take a half dozen executives with you or you'd never get the first meeting no matter how good the product was
I spent ~16 years of my career in life sciences and this is also my experience. There's no way you get into an enterprise account with a pharma as a startup without a lot of deep connections; life sciences space is very high in regulatory requirements and risk and the risk/reward ratio with startups simply isn't worth it.
In my specific space, clinical trials can run for years. A company that might fold if they run out of runway? Non-starter. I was a member of a small company that did make this work and it required that we put our code in escrow with a large multi-national IT company that owned the support contract (customer paid us for licensing, paid multi-national IT company for support, our source code went into escrow).
The key (based on my exp with these 3) is the composition of the team.
At least 1 person on the team needs to have domain experience and if solo, that solo founder needs to have domain experience and good connections or the wherewithal to get the first handful of paying customers via cold calling, cold emails. The main challenge remains sales, marketing, and branding. There are free CRMs and anyone can build a CRM. Why do some CRMs succeed while others fail? Branding, marketing, awareness.
So I don't see it as "there will just be 10x more competitors" because I've built enough stuff that I failed to sell and used enough shitty software to know that the software itself is rarely the reason why people buy X over Y. It's because they didn't even know Y existed.
My biggest question now is - since now anyone can build a SaaS, and since everything is now optimized not for "employment" but for "enterprise" (run your own business), just how many 1-2 person companies can we build? I mean how many genuine sell-able ideas are there. Can we as a society have a 100,000s small software enterprises (and not a few hundred employing 1000s)?
I would love to start my own SaaS company, even if it generates $1000 a month I will be elated. And I have 20+ years of experience programming and in FinTech, but what do I build? Not to mention, without sales & marketing nothing will really work.
Two of the startups are lead by non-technical founders who have strong industry specific experience (legal and finance). The third has a partner that has industry experience (is the ICP).
So you definitely still need strong sales and marketing and a deep understanding of a business domain.
1 person and AI is not sufficient to create a business.
So true, as a mere software developer on a payroll: I might spend 10 minutes doing a task with AI rather than an hour (w/o AI), but trust me - I am going to keep 50 minutes to myself, not deliver 5 more tasks )))) And when I work on my hobby project - having one AI agent crawling around my codebase is like watching a baby in a glassware shop. 10 babies? no thanks!
Same. I am doing this as Claude knocked out two annoying yak shaving tasks I did not really want to do. Required careful review and tweaking.
Claiming that you now have 10 AI minions just wrecking your codebase sounds like showboating. I do not pity the people who will inherit those codebases later.
Disclaimer: not an """AI""" enthusiast. I think it takes away the joy of coding, which makes me sad.
With that out of the way, I don't think there will be "people inheriting codebases" for much longer, at least not in the vast majority of business-related software needs. People will still be useful insofar as you need someone responsible and able to be sued for contract breach, failures and whatnot, but we'll see more and more agents inheriting previous agents codebases. And in the other hand, "small software" that caters to particular customized workflows can be produced entirely by LLMs.
I can totally relate how some of us would want to be off raising goats, planting watermelons or whatever.
> I might spend 10 minutes doing a task with AI rather than an hour (w/o AI), but trust me - I am going to keep 50 minutes to myself, not deliver 5 more tasks
It's wild that you just outright admitted this. Seems like your employer would do best to let you go and find someone that can use tools to increase their productivity.
Show me the incentive, I'll show you the outcome. More than once I've had my hand slapped professionally for taking ownership of something my immediate superiors wanted to micromanage. Fine, here I was trying to take something off their plate that was in my wheelhouse, but if that's where they want to draw the line I guess I'll just give less of a shit.
If you actively deny your employees ownership, then the relationship becomes purely transactional.
It's also possible OP is just a bad employee, but I've met far more demoralized good employees than malicious bad ones over the course of my career.
A lot of orgs are bad about giving credit to employees for productivity, what's the point of working 4x harder if it'll just result in a few % point difference in yearly raise, and you're still going to have to job hop to get a respectable pay bump? Might as well work less and spend time polishing your resume/side projects to make yourself as employable as possible. This is 100% the fault of poor incentives on the part of employers.
> you're still going to have to job hop to get a respectable pay bump
This doesn't exist in a vacuum. I do tasks now for future interviews.
> Might as well work less and spend time polishing your resume/side projects to make yourself as employable as possible.
I don't know what jobs you're applying to, but unless your side project is successful, nobody cares. What they do care about is what you did at your last employer.
> This is 100% the fault of poor incentives on the part of employers.
The people who have your mindset are the people perpetually stuck at poor employers.
> shouldn't we be seeing a ton of 1 person startups?
After months of hearing that people are producing software in months that would normally take years, the best examples of vibe coded software I've seen look like they would normally take months, not years. If you don't care how they're built or how long it took (which a user generally doesn't), much of the remaining shine comes off.
If I'm wrong, I'd love to see it. A genuinely big piece of software produced entirely (or near entirely?) with AI that would've normally taken talented engineers years to build.
DO you have any idea of the man hours it took to build those large projects you are speaking of? Lets take Linux for example. Suppose for the sake of argument that Claude Code with Opus 4.5 is as smart as an average person(AGI), but with the added benefit that he can work 24/7. Suppose now i have millions of dollars to burn and am running 1000 such instances on max plans. Now if I have started running this agent since the date Claude Opus 4.5 was released and i prompted him to create a commercial-grade multi-platform OS from the caliber of Linux.
An estimate of the linux kernel is 100 million man hours of work. divide by 1000. We expect to have a functioning OS like Linux by 2058 from these calcualtions.
How long has claude been released? 2 months.
Linux is valuable, because very difficult bugs got fixed over time, by talented programmers. Bugs which would cause terrible security problems of external attacks, or corrupted databases and many more.
All difficult problems are solved, by solving simple problems first and combining the simple solutions to solve more difficult problems etc etc.
Claude can do that, but you seriously overestimate it's capabilities by a factor of a thousand or a million.
Code that works but it is buggy, is not what Linux is.
Linux is 34 years old, most large software projects are not. Also your using a specific version of Claude, and sure maybe this time is different (and every other time I've heard that over the past 5 years just isn't the same). I don't buy it, but lets go along with it. Going off that, we have the equivalent of 2 years development time according to whats being promised. Have you seen any software projects come out of Claude 4.5 Opus that you'd guess to have been a 2 year project? If so, please do share
I’m building an ERP system, I’ve already been at it for a 3 years (full time, but half the system is already in production with two tenants so not all of my time is spent on completing the product, this revenue completely sustains the project). AI is now speeding this up tremendously. Maybe 2x velocity, which is a game changer but more realistic than what you hear. The post AI features are just as good and stable as pre AI, why wouldn’t they be? I’m not going to put “slop” into my product, it’s all vetted by me. I do anticipate that when the complexity is built out and there are less new features and more maintaining/improving, the productivity will be immense.
I'm not discounting your experience, but purely from experiment design, you don't have any sort of pre/post AI control. You've spent 3 years becoming a subject-matter expert who's building software in your domain; I'm not surprised AI in it's current form is helpful. A more valuable comparison would be something like If you kept going without AI, how long would it take someone with similar domain experience who's just starting their solution to catch using AI?
I do stuff in my free time now that would have been a full time job a year ago. Accomplishing in months what would have taken years. (And doing in days what would have taken weeks.) I'm talking about actually built-out products with a decent amount of code and features, not basic prototypes. I feel like the vibe is "put up or shut up", so check out my bio for one example.
I think your logic goes wrong because you assume that more productivity implies less desire for engineers. But now engineers are maybe 2x or 5x more productive than before. So that makes them more attractive to hire than before. It's not like there was some fixed pool of work to be done and you just had to hire enough to exhaust the pool. It's like if new pickaxes were invented that let your gold miners dig 5x more gold. You'd see an explosion in gold miners, not a reduction. For another example, I spend all my free time coding now because I can do so much now. I get so much more result for the same effort, that it makes sense to put more effort in.
First thing I got was “browser not supported” on mobile. Then I visited the website on desktop and tested languages I’m fluent in and found immediate problems with all of them.
The voices in Portuguese are particular inexcusable, using the Portuguese flag with Brazilian voices; the accents are nothing alike and it’s not uncommon for native speakers of one to have difficulty understanding the other in verbal communication.
The knowledge assessments were subpar and didn’t seem to do anything; the words it tested almost all started with “a” and several are just the masculine/feminine variants. Then, even after I confirmed I knew every word, it still showed me some of those in the learning process, including incredibly basic ones like “I”, or “the”.
The website is something, and I very much appreciate you appear to be trying to build a service which respects the user, but I wouldn’t in good conscience recommend it to anyone. It feels like you have a particular disdain for Duolingo-style apps (I don’t blame you!) but there is so much more out there to explore in language learning.
Haha, thanks for checking it out! I really appreciate the feedback.
> First thing I got was “browser not supported” on mobile.
Yeah, I use some APIs that were only implemented in Safari on iOS 26. Kind of annoying but I use Android so I didn't realize until too late. I should fix it, but it's not a priority given the numerous other things that need improvement (as you noticed!)
> The voices in Portuguese are particular inexcusable, using the Portuguese flag with Brazilian voices; the accents are nothing alike and it’s not uncommon for native speakers of one to have difficulty understanding the other in verbal communication.
That's good feedback, thanks! I only added Portuguese this weekend (https://github.com/yaptown/yap/pull/73) so it's definitely still very alpha (as noted on the website :P )
> The knowledge assessments were subpar and didn’t seem to do anything; the words it tested almost all started with “a” and several are just the masculine/feminine variants.
Thanks, will fix this tonight. The placement test was just added last week (https://github.com/yaptown/yap/pull/72) so there are still some kinks to work out.
> Then, even after I confirmed I knew every word, it still showed me some of those in the learning process, including incredibly basic ones like “I”, or “the”.
Yeah, the logic doesn't really work for people who already know every word. It tries to show words in the following order (descending): probability_of_knowledge * ln(frequency). But if you already know every word, probability_of_knowledge is the same for every word and the ln(frequency) is the only one remaining, meaning you just get the most common words. I'll add a warning to the site for people who are too advanced for the app's dictionary size – as you pointed out, it's not a good UX.
> there is so much more out there to explore in language learning
There is! I usually recommend pimsleur to people. My hope is just for my app to be a useful supplement.
I "just" created a real-time strategy game before christmas because I could have Claude writing all the code and test it itself. It wrote the spec too, by me telling it to plan out a game "a bit like X but with A, B, C features instead".
It works. It's playable. I might put it online some-time when I get a chance.
[EDIT: My involvement apart from the code-skimming mentioned below was mostly play-testing after Claude had "play-tested", and giving it feedback on what to add or change]
My best estimate from having written much simpler games before was that it churned out many months worth of working code in days. I've not written a line of it - just skimmed some code and told it to make a few architectural refactors.
> It's not like there was some fixed pool of work to be done and you just had to hire enough to exhaust the pool.
I'm my opinion you are failing to consider other bottlenecks, a la the theory of constraints.
An analogy: Imagine you have a widget factory that requires 3 machines, executed in sequence, to produce one widget.
Now imagine one of those machines gets 2x-5x more efficient. What will you do? Buy more of the faster machines? Of course not! Maybe you'll scale up by buying more of the slower machines (which are now your bottleneck) so they can match the output of the faster one, but that's only if you can acquire the raw material inputs fast enough to make use of them, and also that you can sell the output fast enough to not end up with a massive unsold inventory.
Bringing this back to software engineering: there are other processes in the software development lifecycle besides writing code -- namely gathering requirements, testing with users (getting feedback), and deployment / operations. And human coordination across these processes is hard, and hard to scale with agents.
These other aspects are much harder to scale (for now, at least) with agents. This is the core reason why agentic development will lead to fewer developers -- because you just don't need as many developers to deliver the same amount of development velocity.
The same logic explains (at least in part) why US companies don't simply continue hiring more and more outsourced developers. At a certain point, more raw development velocity isn't helpful because you're limited by other constraints.
On the other hand, agentic development DOES mean a boon to solo developers, who can MUCH more easily scale just themselves. It's much easier to coordinate between the product team, the development team, the ops team, and the customer support team when all the teams are in the same person's head.
Right but then you expect way more productivity from those teams. I'm wondering where that is.
I find when I'm in a domain I'm not an expert in I am way more productive with the AI tools. With no knowledge of Java or Spring I was able to have AI build out a server in like 10 minutes, when it would have taken me hours to figure out the docs and deployment etc. But like, if I knew Java and Spring I could have built that same thing in 10 minutes anyways. That's not nothing, but also not generalisable to all of software development, not even close. Plus you miss out on actually learning the thing.
I mean at work people are slowed down by management and getting alignment is even slower than before. As PMs and execs keep asking more to be done in the same-ish time, we are getting slow cooked.
Extra productivity at work is not being used at fixing bugs as well.
Yeah work, despite management's best intentions, is really failing AI by being that much relatively slower than engineering potential now. It's a bummer.
> I think your logic goes wrong because you assume that more productivity implies less desire for engineers.
Yes, this is the central fallacy. The reality is, we've been massively bottlenecked on software productivity ever since the concept of software existed. Only a tiny tiny fraction of all the software that could usefully be written has been. The limitation has always been the pool of developers that could do the work and the friction in getting those people to be able to do it.
What it is confounded by however is the short term effect which I think is absolutely drying up the market for new junior software devs. It's going to take a while for this to work through.
"Built out products" like you're earning money on this? Having actual users, working through edge cases, browser quirks, race conditions, marketing, communication - the real battle testing 5% that's actually 95% of the work that in my view is impossible for the LLM? Because yeah the easy part is to create a big boilerplate app and have it sit somewhere with 2 users.
The hard part is day to day operations for years with thousands of edge cases, actual human feedback and errors, knocking on 1000 doors etc.
Otherwise you're just doing slot machine coding on crack, where you work and work and work one some amazing thing then it goes nowhere - and now you haven't even learned anything because you didn't code so the sideproject isn't even education anymore.
> "Built out products" like you're earning money on this?
No, I'm not interested in monetizing stuff, I make enough money from $dayjob.
> Having actual users, working through edge cases, browser quirks, race conditions, marketing, communication - the real battle testing 5% that's actually 95% of the work that in my view is impossible for the LLM?
Yes, all of those. Obviously an LLM won't make a tiktok ad for me, but it can help with all the other stuff. For example, you mentioned browser quirks. I ran into a bug in safari's OPFS implementation that an LLM was able to help me track down and work around. I also ran into the chrome issue where backdrop effects don't work if any of the element's parents have nonzero transparency, and claude helped me find all the cases where that happened and fix them. Both of these are from working on the app in my bio. It's a language app too, so however many edge cases you think there are, there's more :D
I don't want to give the impression that it was not a lot of work. It was an enormous amount of work. It's just that each step is significantly faster now.
> and now you haven't even learned anything because you didn't code so the sideproject isn't even education anymore.
I read every line. You could pull up the github right now and point to any line of code and I could tell you what it does and why it's there and what will break if you remove or change it.
> What's the point of such a project?
I originally made it because I wanted a tool to help me learn French. It has succeeded in helping my enormously, to the point where I can have short conversations with my french family members now. Others seem to find it useful too.
And to push this example further, if you can hire 10 developers each commanding 10 reliable junior-mid developers you have a team of 100, which is probably more than enough to build basically any software project in existence. WhatsApp was built with way less than that.
They are absolutely crushing it. I know of a one-man shop that just got notice they were selected for an eight-figure revenue contract. They would NEVER go public with their head count or their product being built by AI.
> shouldn't we be seeing a ton of 1 person startups
Oh, man, they're just waiting for their poster boy to show up. Once first unicorn "built by a single person" pops up you'll regret having a single social network account.
> shouldn't we be seeing a ton of 1 person startups?
Who should be seeing that? The thing about 1 person startups is that it requires little to no communication to start up, and also very little capital. Seems easy to fly below the radar.
Also "a ton", idk. Doing a startup is still hard, for reasons outside of just being able to write a lot of code. In my experience churning out all this code at 10x is coming with a significant complexity tax: Turns out writing code and thinking about code problems was the relaxing part. When that goes away you have to think about real world problems only. What a fucking mess.
Still, I would assume that it's more of a thing now, and something you could observe when you have YC data for example. Do we know that's not the case? I am in no position to say, one way or the other.
My favorite movie quote as it pertains to software engineering has for a long time been Jurassic Park's: “Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.”
That’s how I feel about a lot of AI-powered development. Just because you can have 10 parallel agents cranking out features 24/7 and have AI write 100% of the code, that doesn’t mean you’re actually building a product that users want and/or that is a viable business.
I’m currently in this situation, working on a greenfield project as founder/solo dev. Yes, AI has been tremendously useful in speeding things up, especially in patching over smaller knowledge gaps of mine.
But in the end, as in all the projects before in my career, building the MVP has rarely been the hard part of starting a company.
I'm not in the startup scene or the US but I've come to understand this as 6 days a week of working 9am-9pm - typical hustle virtue-signalling nonsense and/or the latest move to exploit/shame/scare driven/desperate people to sacrifice their lives unsustainably for the wealth creation of others (and I take the comment you were replying to was criticising this as well).
996 is a work schedule that derives its name from its requirement that workers clock in from 9:00 am to 9:00 pm, 6 days per week, resulting in employees working 12 hours per day and 72 hours per week.
My brother is selling a CRM he developed for his business to others for a couple thousand a month.
There is no way he would have built the CRM as quickly pre-AI.
He built, in a few months, what would have taken maybe one to two years before.
It's probably going to be a while before someone builds the next Instagram with AI. But I think that's more a function of product fit and idea. Less so how fast one person can code.
The first billion-dollar solopreneur likely is going to happen at some point, but it's still a one-in-a-million shot, no matter how fast a person can code.
Look at how many startups fail despite plenty of money for programmers.
But I am seeing friends get to revenue faster with AI on small ideas.
I think the other issue is that the leading toolchain to get real work done (claude code) is also lacking multi modality generation, specifically imagegen. This makes design work more nuanced/technical. And in general, theres a lot of end-product UI/UX issues that generally require the operator to know their way around products. So while we are truly in a boom of really useful personalized software toolchains (and a new TUI product comes out every day), it will take a while for truly polished B2C products to ramp up. I guarantee 2026 sees a surge.
> I would actually expect that current coding AIs would create something very close to Instagram when instructed
Agree 100 percent! I think a lot of us are conflating writing software with building a business. Writing software is not equal to building a business.
Instagram wasn't necessarily hard to code, it was just the right idea at the right time, well executed, combined with some good fortune.
AI is enabling solo founders to launch faster, but those solo founders still need to know how to launch a successful business. Coding is only 10% of launching a business.
My brother has had some success selling software before AI, so he already knows how to launch a business. But, AI helped him take on a more ambitious idea.
> My brother is selling a CRM he developed for his business to others for a couple thousand a month.
There is no way he would have built the CRM as quickly pre-AI
The thing is, if AI is what enabled this, there's no long term market for selling something vibe coded for thousands a month. Maybe right at this moment and good for him, but I have my doubts these random saas things have a future.
I think that's comparing something different. I've seen the one-day vibe code UI tool things which are neat, but it feels like people miss the part that: if it's that easy now, it's not as valuable as it was in the past.
If you can sell it in the meantime, go for it and good for you, but it doesn't feel like that business model will stay around if anyone can prompt it themselves.
A lot of people either a) don’t know about the good tools or b) aren’t using them enough/properly.
There is a ton of anti-AI sentiment, and not all LLMs are equal. There is a lot of individual adoption that is yet to occur.
I know at least two startups that are one person or two people that are punching way above their weight due to this force multiplier. I don’t think it’s industry-wide yet, but it will be relatively soon.
Exactly my opinion. Im pretty pragmatic and open minded, though seasoned enough that I dont stay on the bleeding edge. I became a convert in October, and I think the most recent Sonnet/Opus models truly changed the calculus of "viable/useable" so that we have now crossed into the age of AI.
We are going to see the rest of the industry come along kicking and screaming over the next calendar year, and thats when the ball is going to start truly rolling.
Now they are. Not everyone is using them yet, but they will. There’s zero doubt about it anymore. Lots of people are still not up to date on what is currently possible.
> Not everyone is using them yet, but they will. There’s zero doubt about it anymore.
That’s not true. On HN and elsewhere you’ll find no shortage of folks who don’t use those tools because they don’t want to. People who find enjoyment in doing the thinking and programming themselves, for whom doing it is the goal. For others there are legal and moral considerations. It’s unrealistic to think everyone will be using LLMs for coding, they won’t. Not everyone thinks alike, but for some reason proponents seem incapable of understanding that. All it takes is a bit of empathy and listening to your fellow humans.
I think the Deepseek moment that everyone started trying Deepseek and chain of thought was the weekend of 1/25/25 and 1/26/25.
The progress lived up to the hype the past year. To say otherwise is to be either intellectually dishonest or you just didn't bother using the tools in order to feel how much progress was made.
I just went back to a project that I remember the models struggled with. It felt like years ago but it was from July. Even July to now is night and day different.
> To say otherwise is to be either intellectually dishonest or you just didn't bother using the tools
We can’t have a proper discussion if you start by making wrong and uninformed statements about a stranger and promptly assert that you believe anyone who disagrees with you is either malicious or wilfully ignorant. People can experience the same things and still reach different conclusions or have different opinions.
When the same revolutionary messaging is touted over and over with revised dates whenever the previous prediction hasn’t panned out, anyone is justified in not buying that “this time is different” when that has been said multiple times before.
It’s the boy who cried wolf. Sure, maybe someday it will be true, but save it for when it is instead of repeatedly saying “next year”, “in the next five years”.
If all of this really worked, Claude Code would not be a buggy, slow, frustratingly limited, and overall poorly written application. It can't even reload a "plugin" at runtime. Something that native code plugin hosts have been doing since plugins existed, where it's actually hard to do.
Claude Plugins are a couple `.md` file references, some `/command` handler registrations, and a few other pieces of trivial state. There's not a lot there, but you have to restart the whole damn app to install or update one.
Plus, there's the **ing terminal refresh bug they haven't managed to fix over the past year. Maybe put a team of 30 code agents on that. If I sound bitter, it's because the model itself is genuinely very good. I've just been stuck for a very long time working with it through Claude Code.
Yes, anthropics product design is truly bad, as is their product strategy (hey, I know you just subscribed to Claude, but that isnt Claude Code which you need to separately subscribe to, but you get access to Claude Code if you subscribe to a certain tier of Claude, but not the other way around. Also, you need to subscribe to Claude Code with api key and not usage based pricing or else you cant use Claude Code in certain ways. And I know you have Claude and Claude Code access, but actually you cant use Claude Code in Claude, sorry)
> I see Bay area startups pushing 996 and requiring living in the Bay area because of the importance of working in an office to reduce communication hurdles.
This is toxic behavior by these companies, and is not backed by any empirical data that I’ve ever seen. It should be shunned and called out.
As far as the remainder of your post, I think you’ve uncovered solid evidence that the abilities of LLMs to code on their own, without human planning, architecting, and constant correction, is significantly oversold by most of the companies pushing the tech.
I got laid off in the first half of 2025 and decided to use my severance to see if I could go full-time with my side project. Over the last six months I've gone from zero to about $200k in ARR, and 75% of that was in the last three months. My average customer is paying about $250 / month.
I have zero help, I do everything myself: coding, design, marketing, sales, etc. The product uses AI to replace humans in a niche industry, so the core of the product is AI, but I also increasingly build it with AI. I rarely code manually these days, I'm just riding herd on agents, often in between sales calls, dealing with customer support, etc. I may eventually hire a VA-type person to help with admin and customer support stuff where it changes often enough that it's not worth it to build an AI workflow for, but even there...I don't know. If we get reliable computer use models in 2026 or 2027, I probably won't ever hire anyone.
I've never talked openly in tech circles about this product, nor will I. The technical challenges are non-trivial, so I don't think it'd be easy to replicate for another engineer, but my competitors are all dinosaurs and getting customers to switch to me is incredibly easy. The last thing I need is another engineer spinning up a competitor.
> shouldn't we be seeing a ton of 1 person startups?
Too early. Wait a year. People are just coming to grips how to really make these agents make good changes and large enough changes to really start accelerating.
Also, expect a number of those startups to be entirely stealth and wait longer to raise, as well as maybe in many cases be more fleeting and/or far more fast moving (having to totally re-invent what they're doing at a pace you wouldn't expect to before).
I've been full in on this for 2 years now, and I'm only just at the stage where I feel my setups and model capabilities are intersecting to produce results good enough that I've started testing if one project I'm working on will actually manage to generate revenue.
I'm not going to tell you what it is, because if I did there's too little moat and HN is crawling with great people who could probably replicate it and execute on it faster than me, and Claude is capable of doing all the heavy lifting entirely by itself - that in itself is what makes it potentially viable -, so sorry for being vauge.
If it shows signs of generating revenue, it'll be so cheap to scale because of Claude, that I'll be able scale it far before I need to raise any capital.
But other people will figure it out, most likely other people are already doing the same thing.
As a result I have a short window, and it likely will close as model improvements will make it more and more trivial to do what I'm trying to do, so my approach is to try to extract as much return as I can in as little time as I can, hoping there isn't yet too much competition, and then move on.
This last part will also limit - a lot of people just won't be able to move fast enough (I might not have), and so a lot of these "one person startups" won't ever become visible because they won't even get to a stage where people are ready to talk about it.
In this case, it is easily measurable how much time Claude has saved me, because I've done the same thing before, manually, and made money from it, and the fastest turnaround I've achieved before was 21 days. So far, my first test run with Claude + me in the loop produced the same quality in 3 days, my second in 2 days, my third 12 hours, and I think I can drive it down towards 1-2 hours of my time, with me being the blocker to speeding it up beyond that.
At 21 days it wasn't really profitable. At 1-2 days it "should be" wildly profitable unless I'm already too late. If I can get it down to an hour or two of my time, then I'd also be able to hire to scale it further with good margin, and the question is just finding the sweet spot.
This opportunity will never be a unicorn, but there's a lot of money there if you don't need to raise, and the cost of scaling it to the sweet spot where I maximise my returns is something I should be able to finance without outside money the moment I validate that the unit economics are right.
You might not hear about this "one person startup" again until it either has failed and I decide to tell the story, or it's succeeded but the opportunity has closed and I've made what I can make from it. I suspect there will be many cases like mine that you'll never hear about at all.
(and yes, I realise a lot of people will just dismiss this as bullshit because I won't give details; that's fine)
I'm not dismissing it. I've been working on something secret-squirrel for over 5 years. It wasn't until November that I made a major breakthrough, resulting in four computer science revelations. At first, I wrote about it in a blog post; people didn't even believe me. Some researchers I wrote to validated it.
I hadn't really used Claude before, but if nobody cares ... then commercialize it, delete the blog post and code from the open source world. In the last month, Claude has helped turn it from a <700 line algorithm into nearly a full-blown product in its own right.
But yeah, the moat is small. The core of everything is less than 5k LoC; and it'd be easy af for my soon-to-be competitors to reproduce. The only thing I've got going for me is a non-technical cofounder believing in me and pounding on doors to find our first customer, while I finish up the technical side.
With the computer science revelations, we can basically keep us 6-8 months ahead for the next couple of years. This is the result of years of hard work, but AI has let me take it to market at an astounding speed.
I hope self-promotion isn't frowned upon, but I've been spending the past months figuring out a workflow [1] that helps tackle the "more complicated problems" and ensure long-term maintainability of projects when done purely through Claude Code.
Effectively, I try to:
- Do not allow the LLM to make any implicit decisions, but instead confirm with the user.
- Ensure code is written in such a way that it's easy to understand for LLMs;
- Capture all "invisible knowledge" around decisions and architecture that's difficult to infer from code alone.
It's based entirely on Claude Code sub-agents + skills. The skills almost all invoke a Python script that guides the agents through workflows.
It's not a fast workflow: it frequently takes more than 1 hour just for the planning phase. Execution is significantly faster, as (typically) most issues have been discovered during the planning phase already (otherwise it would be considered a bug and I'd improve the workflow based on that).
I'm under the impression that the creator of Claude Code's post is also intended to raise awareness of certain features of Claude Code, such as hand-offs to the cloud and back. Their workflow only works for small features. It reads a bit like someone took a “best practices” guide and turned it into a twitter post. Nice, but not nearly detailed enough for an actual workflow.
> Ensure code is written in such a way that it's easy to understand for LLMs;
> Capture all "invisible knowledge" around decisions and architecture that's difficult to infer from code alone.
I work on projects where people love to create all sorts of complex abstractions but also hate writing ADRs (so they don’t) or often any sorts of comments and when they do they’re not very well written. Like the expectation is that you should call and ask the person who wrote something or have a multi-hour meeting where you make decisions and write nothing down.
That sort of environment is only conductive to manual work, dear reader, avoid those. Heed the advice above about documenting stuff.
> Ensure code is written in such a way that it's easy to understand for LLMs
Over the summer last year, I had the AI (Gemini Pro 2.5) write base libraries from scratch that area easy for itself to write code against. Now GPro3 can one-shot (with, at most, a single debug loop at the REPL) 100% of the normal code I need developed (back office/business-type code).
Huge productivity booster, there are a few things that are very easy for humans to do that AI struggles with. By removing them, the AI has been just fantastic to work with.
All relevant code fits in context. Functional APIs. Standard data structures. Design documents for everything.
I'm doing this in a Clojure context, so that helps—the core language/libraries are unusually stable and widely used and so feature-complete there's basically no hallucinations.
Thanks for sharing and taking the time to document your repo. I’m also sometimes unsure of “self-promotion” — especially when you don’t have anything to sell, including yourself.
I sometimes don’t share links, due to this and then sometimes overshare or miss the mark on relevance.
But sometimes when I do share people are excited about it, so I’ve leaned more to sharing. Worst is you get some downvotes or negative comments, so why not if there is some lurker who might get benefit.
When you don’t blog or influence, how else but in related HN comment threads are like-minded people gonna know about some random GitHub repo?
My second level hope is that it gets picked up by AI crawlers and get aligned somewhere in the latent space to help prompters find it.
ETA: “The [Prompt Engineer] skill was optimized using itself.” That is a whole other self-promotional writeup possibility right there.
yeah last time I shared it, I got a whole lot of hate for vibe coder self promotional BS so I decided to tread a bit more carefully this time.
I encourage you to try to prompt engineer skill! It’s one of the easiest to use, and you can literally use it on anything, and you’ll also immediately see how the “dynamic prompt workflow” works.
Yes thank you! I find I get more than enough done (and more than enough code to review) by prompting the agent step by step. I want to see what kind of projects are getting done with multiple async autonomous agents. Was hoping to find youtube videos of someone setting up a project for multiple agents so I could see the cadence of the human stepping in and making directions
I have not used Claude. But my experience with Gemini and aider is that multiple instances of agents will absolutely stomp over each other. Even in a single sessions overwriting my changes after telling the agent that I did modifications will often result in clobbering.
You should try Claude opus 4.5 then. I haven’t had that issue. The key is you need to have well defined specs and detailed instructions for each agent.
Op mentions in the follow up comments that he does a separate git checkout, one for each of the 5 Claude Code agents he runs. So each is independent and when PRs get submitted that's where the merging happens.
I run 3-5 on distinct projects often. (20x plan) I quite enjoy the context switching and always have. I have a vanilla setup too, and I don't use plugins/skills/commands, sometimes I enable a MCP server for different things and definitely list out cli tools in my claude.md files. I keep a Google doc open where I list out all the projects I'm working on and write notes as I'm jumping thought the Claude tabs, I also start drafting more complex prompts in the Google doc. I've been using turbo repo a lot so I don't have to context switch the architecture in my head. (But projects still using multiple types of DevOps set ups)
Often these days I vibe code a feedback loop for each project, a way to validate itself as OP said. This adds time to how long Claude takes to complete giving me time to switch context for another active project.
I also use light mode which might help others... jks
I suppose he may have a list of feature requests and bug reports to work on, but it does seem a bit odd from a human perspective to want to work on 5 or more things literally in parallel, unless they are all so simple that there is no cognitive load and context switching required to mentally juggle them.
Washing dishes in parallel with laundry and cleaning is of course easily possible, but precisely because there is no cognitive load involved. When the washing machine stops you can interrupt what you are doing to load clothes into the drier, then go back to cleaning/whatever. Software development for anything non-trivial obviously has a much higher task-switching overhead. Optimal flow for a purely human developer is to "load context" at the beginning of the day, then remain in flow-state without interruptions.
The cynical part of me can't also help but wonder if Cherny/Anthopic aren't just advocating token-maxxing!
Same though here. I use Claude opus via api billing for tasks not that hard to implement but for which CC takes much less time than I would. However:
* a small PR costs 5-16 usd (I’ve been monitoring this for the past two days). Management is already pushing for us to use Cursor or a new tool called Augment Cod.
* I can submit 4 to 5 PRs in a day
* the bottleneck becomes:
- writing clear instructions and making the right choices
- running tests
- my mental capacity for context switching
- code reviewing, correcting
- Deployment
- Even further live testing
I don’t understand how I could have 10 parallel workers without the output being degraded due to my inability to manage them. But I can see myself wasting a lot of $$ trying. And something tells me the thread is just normalizing throwing money at them
I noticed yesterday that there were 5K+ issues filed against Claude Code on github (but down to 4.8K today!), so it may well be that this is what Cherny is churning through.
If you read though a few pages of these issues, it doesn't seem to reflect too well on the quality of the code (self-written by Claude Code), so it seems the furious pace of development/bug fixing maybe shouldn't necessarily be taken as being the pace of generating production quality code. Claude Code is of course very useful, so people are very forgiving about issues, but I can't imagine most corporate software being very well regarded if the quality was such that it had 5K issues reported against it!
I agree. I'm imagining a large software team with hundreds of tickets "ready to be worked on" might support this workflow - but even then, surely you're going to start running into unnecessary conflicts.
The max Claude instances I've run is 2 because beyond that, I'm - as you say - unable to actually determine the next best course during the processing time. I could spend the entire day planning / designing prompts - and perhaps that will be the most efficient software development practise in the future. And/or perhaps there it is a sign I'm doing insufficient design up front.
I would do the same thing if I would justifing paying 200$ per Month for my hobby. But even with that, you will run into throttling / API / Resource limits.
But AI Agents need time. They need a little bit of reading the sourcecode, proposing the change, making the change, running the verification loop, creating the git commit etc. Can be a minute, can be 10 and potentially a lot longer too.
So if your code base is big enough that you can work o different topics, you just do that:
- Fix this small bug in the UI when xy happens
- Add a new field to this form
- Cleanup the README with content x
- . . .
I'm an architect at work and have done product management on the side as its a very technical project. I have very little problem coming up with things to fix, enhnace, cleanup etc. I have hard limits on my headcount.
I could easily do a handful of things in parallel and keeping that in my head. Working memory might be limited but working memory means something different than following 10 topics. Especially if there are a few tpics inbetween which just take time with the whole feedback loop.
But regarding your example of house cleaning: I have ADHD, i sometimes work like this. Working on something, waiting for a build and cleaning soming in parallel.
What you are missing is the practical experience with Agents. Taking the time and energy of setting up something for you, perhaps accessability too?
We only got access at work to claude code since end of last year.
For me it's their speed, yes. I only run 0-3 at a time, and often the problem at hand is very much not complex. For example "Take this component out of the file into its own file, including its styles." The agent may take 5 minutes for that and what do I do in the meantime? I can start another agent for the next task at hand.
Could also be a bug hunt "Sometimes we get an error message about XYZ, please investigate how that might happen." or "Please move setting XY from localstorage to cookies".
I rarely run 10 top-level sessions, but I often run multiple.
Here is one case, though:
I have a prototype Ruby compiler that long languished because I didn't have time. I recently picked up work on it again with Claude Code.
There are literally thousands of unimplemented methods in the standard library. While that has not been my focus so far, my next step for it is to make Claude work on implementing missing methods in 10+ sessions in parallel, because why not? While there are some inter-dependencies (e.g. code that would at least be better with more of the methods of the lowest level core classes already in place), a huge proportion are mostly independent.
In this case the rubyspec test suite is there to verify compliance. On top of that I have my own tests (does the compiler still compile itself, and does the selftests still run when compiled with self-compiled compiler?) so having 10+ sessions "pick off" missing pieces, make an attempt see if it can make it pass, and move on, works well.
My main limitation is that I have already once run into the weekly limits of my (most expensive) Claude Max subscription, and I need it for other things too for client work and I'm not willing to pay-per-token for the API use for that project since it's not immediately giving me a return.
(And yes, they're "slow" - but faster than me; if they were fast enough, then sure, it'd be nicer to have them run serially, the same way if you had time it's easier to get cohesive work if a single developer does all the work on a project instead of having a team try to coordinate)
It just happens automatically. Once you set it running and it's chugging away there's nothing for you to do for a while. So of course you start working on something else. Then that is running ... before you know it, 5 of them are going and you have forgotten which is what and this is your new problem.
For one of the things I am doing, I am the solo developer on a web application. At any given point, there are 4-5 large features I want and I instruct Claude to heavily test those features, so it is not unusual for each to run for 30-45 minutes and for overall conversations to span several hours. People are correct that it often makes mistakes, so that testing phase usually uncovers a bunch of issues it has to fix.
I usually have 1-2 mop up terminal windows open for small things I notice as I go along that I want to fix. Claude can be bad about things like putting white text on a white button and I want a free terminal to just drop every little nitpick into it. They exist for me to just throw small tasks into. Yes, you really should start a new convo every need, but these are small things and I do not want to disrupt my flow.
There are another 2-3 for smaller features that I am regularly reviewing and resetting. And then another one dedicated to just running the tests already built over and over again and solving any failures or investigating things. Another one is for research to tell me things about the codebase.
People are doing this lots of different ways. Some run it in its own containers or in instances on the web. Some are using git worktrees. I use a worktree for anything large, but smaller stuff is just done in the local files.
Sloppy? Perhaps, but Claude has never made such a big mess that it has needed its work wiped.
> Sloppy? Perhaps, but Claude has never made such a big mess that it has needed its work wiped.
I think a key thing to point out to people here is that Claude's built in editing tools won't generally allow it to write to a file that has changed since last time it read it, so if it tries to write and gets an error it will tend to re-read the file, adjust its changes accordingly before trying again. I don't know how foolproof those tests are, because Claude can get creative with sed and cat to edit files, and of course if a change crosses file boundaries this might not avoid broken changes entirely. But generally - as you said - it seems good at avoiding big messes.
Depends on the project you are working on. Solo on a web app? You probably have 100s of small things to fix. Some more padding there, add a small new feature here, etc.
> don't need 10 parallel agents making 50-100 PRs a week
I don't like to be mean, but I few weeks ago the guy bragged about Claude helping him do +50k loc and -48k loc(netting a 2k loc), I thought he was joking because I know plenty of programmers who do exactly that without AI, they just commit 10 huge json test files or re-format code.
I almost never open a PR without a thorough cleanup whereas some people seem to love opening huge PRs.
I use Beads which makes it more easy to grasp since its "tickets" for the agent, and I tell it what I want, it creates a bead (or "ticket") and then I ask it to do research, brain dump on it, and even ask it to ask me clarifying questions, and it updates the tasks, by the end once I have a few tasks with essentially a well defined prompt, I tell Claude to run x tasks in parallel, sometimes I dump a bunch of different tasks and ask it to research them all in parallel, and it fills them in, and I review. When it's all over, I test the code, look at the code, and mention any follow ups.
I guess it comes down to, how much do you trust the agent? If you don't trust it fully you want to inspect everything, which you still can, but you can choose to do it after it runs wild instead of every second it works.
My impression is that people who are exploring coordinated multi-agent-coding systems are working towards replacing full teams, not augmenting individuals. "Meaningful supervising role" becomes "automated quality and process control"; "generate requirements quickly" -> we already do this for large human software teams.
If that's the goal, then we shouldn't interpret the current experiment as the destination.
Potentially, a lot of that isn't just code generation, it *is* requirements gathering, design iteration, analysis, debugging, etc.
I've been using CC for non-programming tasks and its been pretty successful so far, at least for personal projects (bordering on the edge of non-trivial). For instance, I'll get a 'designer' agent coming up with spec, and a 'design-critic' to challenge the design and make the original agent defend their choices. They can ask open questions after each round and I'll provide human feedback. After a few rounds of this, we whittle it down to a decent spec and try it out after handing it off to a coding agent.
Another example from work: I fired off some code analysis to an agent with the goal of creating integration tests, and then ran a set of spec reviewers in parallel to check its work before creating the actual tickets.
My point is there are a lot of steps involved in the whole product development process and isn't just "ship production code". And we can reduce the ambiguity/hallucinations/sycophancy by creating validation/checkpoints (either tests, 'critic' agents to challenge designs/spec, or human QA/validation when appropriate)
The end game of this approach is you have dozens or hundreds of agents running via some kind of orchestrator churning through a backlog that is combination human + AI generated, and the system posts questions to the human user(s) to gather feedback. The human spends most of the time doing high-level design/validation and answering open questions.
You definitely incur some cognitive debt and risk it doing something you don't want, but thats part of the fun for me (assuming it doesn't kill my AI bill).
Do you generally only have one problem? For me the use case is that I have numerous needs and Claude frees up time to work on some of the more complicated ones.
> It's like someone is claiming they unlocked ultimate productivity by washing dishes, in parallel with doing laundry, and cleaning their house.
In this case you have to take a leap of faith and assume that Claude or Codex will get each task done correctly enough that your house won't burn down.
Agree. People are stuck applying the "agent" = "employee" analogy and think they are more productive by having a team/company of agents. Unless you've perfectly spec'ed and detailed multiple projects up front, the speed of a single agent shouldn't be the bottleneck.
>> I need 1 agent that successfully solves the most important problem
In most of these kinds of posts, that's still you. I don't believe i've come across a pro-faster-keyboard post yet that claims AGI. Despite the name, LLMs have no agency, it's still all on you.
Once you've defined the next most important problem, you have a smaller problem - translate those requirements into code which accurately meets them. That's the bit where these models can successfully take over. I think of them as a faster keyboard and i've not seen a reason to change my mind yet despite using them heavily.
If cars do not have agency how useful are they going to be. If the Internet does not have agency how useful is going to be. if fire has no agency (debatable) how useful is going be.
Call it what you want, but people are going to call the LLM with tools in a loop, and it will do something. There was the AI slop email to Rob Pike thing the other day, which was from someone giving an agent the instruction to "do good", or some vague high level thing like that.
The problem isn't generating requirements, it's validating work. Spec driven development and voice chat with ticket/chat context is pretty fast, but the validation loop is still mostly manual. When I'm building, I can orchestrate multiple swarm no problem, however any time I have to drop in to validate stuff, my throughput drops and I can only drive 1-2 agents at a time.
It depends on the specifics of the tasks; I routinely work on 3-5 projects at once (sometimes completely different stuff), and having a tool like cloud code fits great in my workflow.
Also, the feedback doesnt have to be immediate: sometimes I have sessions that run over a week, because of casual iterations; In my case its quite common to do this to test concepts, micro-benchmarking and library design.
If you're trying to solve one very hard problem, parallelism is not the answer. Recursion is.
Recursion can give you an exponential reduction in error as you descend into the call stack. It's not guaranteed in the context of an LLM but there are ways to strongly encourage some contraction in error at each step. As long as you are, on average, working with a slightly smaller version of the problem each time you recurse, you still get exponential scaling.
> I need 1 agent that successfully solves the most important problem.
If you only have that one problem, that is a reasonable criticism, but you may have 10 different problems and want to focus on the important one while the smaller stuff is AIed away.
> I don't understand how you can generate requirements quicky enough to have 10 parallel agents chewing away at meaningful work.
I am generally happy with the assumptions it makes when given few requirements? In a lot of cases I just need a feature and the specifics are fairly open or very obvious given the context.
For example, I am adding MFA options to one project. As I already have MFA for another portal on it, I just told Claude to add MFA options for all users. Single sentence with no details. Result seems perfectly servicable, if in need of some CSS changes.
Exactly. And if that problem is complex, your first step should be to plan how to sub-divide it anyway. So just ask Claude to map out interdependencies for tasks to look for opportunities to paralellise.
The captive audience is not you, it's people salivating at the train of thought where they can 100x productivity of whatever and push those features that will get paying customers so they can get bought from private equity and ride out on the sunset. This whole thing is existential dread on a global scale, driven by sociopaths and everyone is just unable to not bend over.
Painfully true. A lot of YouTube on LLM coding tools has become just that. Make quick bucks, look it generated a dashboard of some sort (why is it always dashboards?) and a high polished story of someone vibing a copy of a successful Saas and selling it off for a million.
A shame really, for there are good resources for better making use of LLMs in coding.
The only way to achieve that level of parallelism is by not knowing what you are doing or the peoblem space you are working in to begin with and just throwing multiple ill defined queries at agents until something "works". It's sort of a modern infinite monkey theorem if you will.
It's all smokes really. Claude Code is an unreliable piece of software and yet one of the better ones in LLM-Coding. (https://github.com/anthropics/claude-code/issues). That and I highly suspect it's mostly engineers who are working on it instead of LLMs. Google itself with all its resources and engineers can't come up with a half-decent CLI for coding.
Reminder: The guy works for Claude. Claude is over-hyping LLMs. That's like a Jeweler dealer assistant telling you how Gold chains helped his romantic life.
Which would have effectively overriden my whole bashrc config if I had blindly copy-pasted it.
A few minutes later, asking it to create a .gitignore file for the current project - right after generating a private key, it failed to include the private key file to the .gitignore.
I don't see yet how these tools can be labeled as 'major productivity boosters' if you loose basic security and privacy with them...
Let’s not forget the massive bias in the author: for all we know this post is a thinly veiled marketing pitch for “how to use the most tokens from your AI provider and ramp up your bill.”
This isn’t about being the most productive or having the best workflow, it’s about maximizing how much Claude is a part of your workflow.
> This is interesting to hear, but I don't understand how this workflow actually works
The cynic in me is it's a marketing pitch to sell "see this is way cheaper than 10 devs!". The "agent" thing leans heavily into bean counter CTO/CIO marketing.
Claude is absolutely plastering Facebook with this bullshit.
Every PR Claude makes needs to be reviewed. Every single one. So great! You have 10 instances of Claude doing things. Great! You're still going to need to do 10 reviews.
It's interesting to see this sentiment, given there are literal dozens of people I know in person who have no affiliations with Anthropic, living in Tokyo, and rave about Claude Code. It is good. Not perfect, but it does a lot of good stuff that we couldn't do before because of time restrictions.
I am surprised by how many people don't know that Claude Code is an excellent product. Nevertheless, PR / influencer astroturfing makes me not want to use a product, which is why I use Claude in the first place and not any OpenAi products.
It is an excellent product but the narrative being pushed is that there's something unique about Claude Code, as if ChatGPT or Gemini don't have exactly the same thing.
Even having Opus review code written by Opus works very well as a first pass. I typically have it run a sub-agent to review its own code using a separate prompt. The sub-agents gets fresh context, so it won't get "poisoned" by the top level contexts justifications for the questionable choices it might have made. The prompts then direct the top level instance to repeat the verification step until the sub-agent gives the code a "pass", and fix any issues flagged.
The result is change sets that still need review - and fixes - but are vastly cleaner than if you review the first output.
Doing runs with other models entirely is also good - they will often identify different issues - but you can get far with sub-agents and different persona (and you can, if you like, have Claude Code use a sub agent to run codex to prompt it for a review, or vice versa - a number of the CLI tools seems to have "standardized" on "-p <prompt>" to ask a question on the command line)
Basically, reviewing output from Claude (or Codex, or any model) that hasn't been through multiple automated review passes by a model first is a waste of time - it's like reviewing the first draft from a slightly sloppy and overly self-confident developer who hasn't bothered checking if their own work even compiles first.
> Basically, reviewing output from Claude (or Codex, or any model) that hasn't been through multiple automated review passes by a model first is a waste of time - it's like reviewing the first draft from a slightly sloppy and overly self-confident developer who hasn't bothered checking if their own work even compiles first.
Well, that's what the CI is for. :)
In any case, it seems like a good idea to also feed the output of compiler errors and warnings and the linter back to your coding agent.
Sure, but I'd prefer to catch it before that, not least because it's a simpler feedback loop to ensure Claude fixes its own messes.
> In any case, it seems like a good idea to also feed the output of compiler errors and warnings and the linter back to your coding agent.
Claude seems to "love" to use linters and error messages if it's given the chance and/or the project structure hints at an ecosystem where certain tools are usually available. But just e.g. listing by name a set of commands it can use to check things in CLAUDE.md will often be enough to have it run it aggressively.
If not enough, you can use hooks to either force it, or sternly remind it after every file edit, or e.g. before it attempts to git commit.
At the begining of the project, the runs are fast, but as the project gets bigger, the runs are slower:
- there are bigger contexts
- the test suite is much longer and slower
- you need to split worktree, resources (like db, ports) and sometimes containers to work in isolation
So having 10 workers will run for a long time. Which give plenty of time to write good spec.
You need good spec, so the llm produce good tests, so it can write good code to match these tests.
Having a very strong spec + test suite + quality gates (linter, type checkers, etc) is the only way to get good results from an LLM as the project become more complex.
Unlike a human, it's not very good at isolating complexity by itself, nor stopping and asking question in the face of ambiguity. So the guardrails are the only thing that keeps it on track.
And running a lot of guardrail takes time.
E.G: yesterday I had a big migration to do from HTMX to viewjs, I asked the LLM to produce screenshots of each state, and then do the migration in steps in a way that kept the screenshit 90% identical.
This way I knew it would not break the design.
But it's very long to run e2e tests + screenshot comparison every time you do a modification. Still faster than a human, but it gives plenty of time to talk to another llm.
Plus you can assign them very different task:
- One work on adding a new feature
- One improves the design
- One refactor part of the code (it's something you should do regularly, LLM produce tech debt quickly)
- One add more test to your test suite
- One is deploying on a new server
- One is analyzing the logs of your dev/test/prod server and tell you what's up
- One is cooking up a new logo for you and generating x versions at different resolutions.
> I don't understand how you can generate requirements quicky enough to have 10 parallel agents chewing away at meaningful work.
You use agents to expand the requirements as well, either in plan mode (as OP does) or with a custom scaffold (rules in CLAUDE.md about how to handle requirements; personally I prefer giving Claude the latitude to start when Claude is ready rather than wait for my go-ahead)
> I don't understand how you can have any meaningful supervising role over 10 things at once given the limits of human working memory.
[this got long: TL;DR: This is what works for me: Stop worrying about individual steps; use sub-agents and slash-commands to encapsulate units of work to make Claude run longer; use permissions to allow as much as you dare (and/or run in a VM to allow Claude to run longer; give Claude tools to verify its work (linters, test suites, sub-agents double-checking the work against the spec) and make it use it; don't sit and wait and read invidiual parts of the conversation - it will only infuriate you to see Claude make stupid mistakes, but if well scaffolded it will fix them before it returns the code to you, so stop reading, breathe, and let it work; only verify when Claude has worked for a long time and checked its own work -- that way you review far less code and far more complete and coherent changes]
You don't. You wait until each agent is done, and you review the PR's. To make this kind of thing work well you need agents and slash-commands, like OP does - sub-agents in particular help prevent the top-level agents from "context anxiety": Claude Code appears to have knowledge of context use, and will be prone to stopping before context runs out; sub-agents use their own context and the top-level agent only uses context to manage the input to and output from them, so the more is farmed out to sub-agents, the longer Claude Code is willing to run. I when I got up this morning, Claude Code had run all night and produced about 110k words of output.
This also requires extensive permissions to use safe tools without asking (what OP does), or --dangerously-skip-permissions (I usually do this; you might want to put this in a container/VM as it will happily do things like "killall -9 python" or similar without "thinking through" consequences - I've had it kill the terminal it itself ran in before), or it'll stop far too quickly.
You'll also want to explicitly tell it to do things in parallel when possible. E.g. if you want to use it as a "smarter linter" (DO NOT rely on it as the only linter, use a regular one too, but using claude to apply more complex rules that requires some reasoning works great), you can ask it to "run the linter agent in parallel on all typescript files" for example, and it will tend to spawn multiple sub-agents running in parallel, and metaphorically twiddle its thumbs waiting for them to finish (it's fun seeing it get "bored" and decide to do other things in the meantime, or get impatient and check on progress obsessively).
You'll also want to make Claude use sub-agents to review, verify, test its work, with instructions to repeat until all the verification sub-agents give its changes a PASS (see 12/ and 13/ in the thread) - there is no reason for you to waste your time reviewing code that Claude itself can tell isn't ready.
[E.g. concrete example: "Vanilla" Claude "loves" using instance_variable_get() in Ruby if facing a class that is missing an accessor for an instance variable. Whether you know Ruby or not, that should stand out like a sore thumb - it's a horrifically gross code smell, as it's basically bypassing encapsulation entirely. But you shouldn't worry about that - if you write Ruby with Claude, you'd want a rule in CLAUDE.md telling it how to address missing accessors, and sub-agent, and possibly a hook, making sure that Claude is told to fix it immediately if it ever uses it.]
Farming it off to sub-agents both makes it willing to work longer, especially on "boring" tasks, and avoids the problem that it'll look at past work and decide it already "knows" this code is ready and start skipping steps.
The key thing is to stop obsessing over every step Claude takes, and treat that as a developer experimenting with something they're not clear on how to do yet. If you let it work, and its instructions are good, and it has ways of checking its work, it will figure out its first attempts are broken, fix them, and leave you with output that takes far less of your time to review.
When Claude tells you its done with a change, if you stop egregious problems, fix your CLAUDE.md, fix your planning steps, fix your agents.
None of the above will absolve you of reviewing code, and you will need to kick things back and have it fix them, and sometimes that will be tedious, but Claude is good enough that the problems you have it fix should be complex, not simple code smells or logic errors, and 9 out 10 times they should signal that your scaffold is lacking important detail about your project or that your spec is incomplete at a functional/acceptance criteria level (not low level detail)
50-100 PRs a week to me is insane. I'm a little skeptical and wonder how large/impactful they are. I use AI a lot and have seen significant productivity gains but not at that level lol.
Most likely
* The rest of the team also reviews
* If you're the founder, chances are that people will just accept reviews without reading much and give you priority in reviews
I work for a FAANG and I'm the top reviewer in my team (in terms of number of PRs reviewed). I work on an internal greenfield project, so something really fast moving.
For ALL of 2025 I reviewed around 400 PRs. And that already took me an extreme amount of time.
Nobody is reviewing this many PRs.
I've also raised around 350 PRs in the same year, which is also #1 for my team.
AI or not, nobody is raising upwards of 3,500 CRs a year. In fact, my WHOLE TEAM of 15 people has barely raised this number of CRs for the year.
I don't know why people keep believing those wild unproven claims from actors who have everything to gain from you believing them. Has common sense gone down the drain that much, even for educated professionals?
> I don't know why people keep believing those wild unproven claims from actors who have everything to gain from you believing them.
It's grifters all the way down. The majority of people pushing this narrative have vested interests, either because they own some AI shovelware company or are employed by one of the AI shovelware companies. Anthropic specifically is running guerilla marketing campaigns fucking everywhere at the moment, it's why every single one of these types of spammed posts reads the same way. They've also switched up a bit of late, they stopped going with the "It makes me a 10x engineer!" BS (though you still see plenty of that) and are instead going with this weird "I can finally have fun developing again!" narrative instead, I guess trying to cater to the ex-devs that are now managers or whatever.
What happens is you get juniors and non-technical people seeing big numbers and being like "Wow, that's so impressive!" without stopping to think for 5 seconds what the kind of number they're trying to push even actually means. 100 PRs is absurd unless they're tiny oneliners, and even if they were tiny changes, there's 0 chance anyone is looking at the code being shat out here.
Reviewing PRs should be for junior engineers, architectural changes, brand new code, or broken tests. You should not review every PR; if you do, you're only doing it out of habit, not because it's necessary.
PRs come originally from the idea that there's an outsider trying to merge code into somebody's open source project, and the Benevolent Dictator wants to make sure it's done right. If you work on a corporate SWEng team, this is a completely different paradigm. You should trust your team members to write good-enough code, as long as conventions are followed, linters used, acceptance tests pass, etc.
> You should trust your team members to write good-enough code...
That's the thing, I trust my teammate, I absolutely do not trust any LLM blindly. So if I were to receive 100 PRs a week and they were all AI-generated, I would have to check all 100 PRs unless I just didn't give a shit about the quality of the code being shit out I guess.
And regardless, whether I trust my teammates or not, it's still good to have 2 eyes on code changes, even if they're simple ones. The majority of the PRs I review are indeed boring (boring is good, in this context) ones where I don't need to say anything, but everyone inevitably makes mistakes, and in my experience the biggest mistakes can be found in the simplest of PRs because people get complacent in those situations.
For many years, all the projects I’ve been in had mandatory code review, some in the form of PRs (a github fabrication), most as review requests in other tooling.
This applies to everything from platform code, configuration, tooling to production software.
Inside a component, we use review to share knowledge about how something was implemented and reach consensus on the implementation details. Depending on developer skill level, this catches style, design issues or even bugs. For skilled developers, it’s usually comments on code-to-architecture mismatches, understandability, etc. Sometimes not entirely objective things, that nevertheless contribute to developing and maintaining a team consensus and style.
Discussions also happen outside and before review, but we’ve found reviews invaluable.
If a team has yearly turnover or different skill levels (typical for most teams), not reviewing every commit is sloppy. Which has an additional meaning now with AI slop :)
I am also skeptical about the need for such a large number of PRs. Do those open because of previous PRs not accomplishing their goals?
It's frustrating because being part of a small team, I absolutely fucking hate it when any LLM product writes or refractors thousands of lines of code. It's genuinely infuriating because now I am fully reliant on it to make any changes, even if it's really simple. Just seems like a new version of vendor lock-in to me.
Because he is working on a product that is hot and has demand from the users for new features/bug fixes/whatnot and also gets visibility on getting such things delivered. Most of us don't work on products that have that on a daily basis.
In other words, nobody cares that the generated code is shit, because there is no human who can review that much code. Not even on high level.
According to the discussion here, they don’t even care whether the tests are real. They just care about that it’s green. If tests are useless in reality? Who cares, nobody has time to check them!
And who will suffer because of this? Who cares, they pray that not them!
That is the case, whether the code is AI generated or not. Go take a look at some of the source code for tools you use ever day, and you'll find a lot of shit code. I'd go so far as to say, after ~30 years of contributing to open source, that it's the rare jewel that has clean code.
Yeah, but there is a difference, between if at least one people at one point of time understood the code (or the specific part of it), and none. Also, there are different levels. Wildfly’s code for example is utterly incomprehensible, because the flow jumps on huge inheritance chains up and down to random points all the time; some Java Enterprise people are terrible with this. Anyway, the average for tools used by many is way better than that. So it’s definitely possible to make it worse. Blindly trusting AI is one possible way to reach those new lows. So it would be good to prevent it, before it’s too late, and not praising it without that, and even throwing out one of the (broken, but better than nothing) safeguard. Especially how code review is obviously dead with such amount of generated code per week. (The situation wasn’t great there either before) So it’s a two in one bad situation.
For comparison, I remember doing 250 PRs in 2.5 months of my internship at FB (working on a fullstack web app). So that’s 2-4x faster. What’s interesting is that it’s Boris, not an intern (although the LLM can play an intern well).
This was extremely useful to read for many reasons, but my favorite thing I learned is that you can “teleport” a task FROM the local Claude Code to Claude Code on the web by prepending your request with “&”. That makes it a “background” task, which I initially erroneously thought was a local background task. Turns out it sends the task and conversation history up to the web version. This allows you to do work in other branches on Claude Code web, (and then teleport those sessions back down to local later if you wish)
OpenCode is actually client server architecture. Typically one either runs the TUI or the web interface. I wonder if it would cope ok with running multiple interfaces at once?
Neovim has a decade old feature request for multiple clients to be able to connect to it. No traction alas. Always a great superpower to have, if you can hack it. https://github.com/neovim/neovim/issues/2161
Chrome DevToops Protocol added multiple client support maybe 5 years ago? It's super handy there because automation tools also use the same port. So you couldn't automate and debug at the same time!
That is a really tool ability, to move work between different executors. OpenCode is also super good at letting you open an old session & carry on, so you can switch between. I appreciate the mention; I love the mobile ambient aspect of how Claude Code can teleport this all!!
> Neovim has a decade old feature request for multiple clients to be able to connect to it. No traction alas.
Why cram all features into one giant software instead of using multiple smaller pieces of software in conjunction? For the feature you mentioned I just use tmux which is built for this stuff.
Also, OpenCode has been extremely unreliable. I opened a PR about one of the simplest tools ever: `ls`, and they haven't fixed it yet. In a folder, their ls doesn't actually do what you'd expect: if iterates over all files of all folders (200 limit) and shows them to the model...
> Neovim has a decade old feature request for multiple clients to be able to connect to it. No traction alas. Always a great superpower to have, if you can hack it. https://github.com/neovim/neovim/issues/
1) everyone on the team uses Claude code differently.
2) Claude Code has been around for almost a year and is being built by an entire team, yet doesn't seem to have benefited from this approach. The program is becoming buggier and less reliable over time, and development speed seems indistinguishable from anything else.
3) Everything this person says should be taken with a massive grain of salt considering their various conflicts of interest.
The UI flickers rapidly in some cases when I use it in the VSCode terminal. When I first saw this when using Claude Code I imagined it was some vibe code bug that would be worked out quickly. But it's been like 9 months and still every day it has this behavior - to the point that it crashes VSCode! I can only imagine that no one at Anthropic uses VSCode because it really seems insane it's gone this long unfixed.
It's the worst experience in tmux! They lectured us about how the roots of the problem go deep, but I don't have this issue with any other CLI agent tool like Codex.
The VSCode terminal seems buggy with complex TUI applications in my experience; I had to use the Gemini CLI in a separate terminal because it was brutally slow in the VSC terminal.
That being said, this isn't a huge issue for CC - you can just use the extension, which offers a similar experience.
Same thing happens to me in long enough sessions in xterm. Anecdotally it's pretty much guaranteed if I continue a session close to the point of context compacting, or if the context suddenly expands with some tool call.
Edit: for a while I thought this was by design since it was a very visceral / graphical way to feel that you're hitting the edge of context and should probably end the session.
If I get to the flicker point I generally start a new session. The flicker point always happens though from what I have observed.
That one's definitely annoying, but I suspect that's due to some bad initial design choices (React for a terminal app!) and I think it's definitely better than it used to be.
OK, so you have the unbearable pain of using a separate terminal app to use the magic thingie that does your programming for you on prompt, and which didn't exist merely 2 years ago.
I am a fan of Claude code, I love it, I use it every day. Are you suggesting we’re not allowed to make any critique of anything which has good qualities?
Claude Code is fairly simple. But Claude Desktop is a freaking mess, it loses chats when I switch tabs, it has no easy way to auto-extend the context, and it's just slow.
Yeah same. Also it completely freezes on my iPhone with sufficient code highlighting. It becomes completely unusable until I restart the App, and then breaks once a new message is sent.
I also find it odd that despite a whole team of people working on Claude Code with Claude Code, which should make them immensely productive, there are still glaring gaps. Like, why doesn’t Claude Code on Web have the plan mode? The model already knows how to use it, it’s just a UI change.
Normally I would cut them some slack but it doesn’t really make sense, couldn’t someone kick off a PR today and get it done?
> The program is becoming buggier and less reliable over time
You know, this bugs me out. Claude code, macOS, Windows... they're all becoming buggy, filled with papercuts everywhere... and this coincided with layoffs + mass adoption of LLM coding tools and services.
>2) Claude Code has been around for almost a year and is being built by an entire team, yet doesn't seem to have benefited from this approach. The program is becoming buggier and less reliable over time, and development speed seems indistinguishable from anything else.
Not my experience at all (macOS Tahoe/iTerm2, no tmux).
Speaking of either Claude Code as a tool, or Claude 4.5 as an LLM used with coding.
> Claude Code has been around for almost a year and is being built by an entire team, yet doesn't seem to have benefited from this approach. The program is becoming buggier and less reliable over time, and development speed seems indistinguishable from anything else.
Shhh, this is not what you’re supposed to look at.
Look! Bazillion more agents! Gorrilion more agents! Productivity! Fire those lazy code monkeys, buy our product! Make me riiiich.
I implemented some of his setup and have been loving it so far.
My current workflow is typically 3-5 Claude Codes in parallel
- Shallow clone, plan mode back and forth until I get the spec down, hand off to subagent to write a plan.md
- Ralph Wiggum Claude using plan.md and skills until PR passes tests, CI/CD, auto-responds to greptile reviews, prepares the PR for me to review
- Back and forth with Claude for any incremental changes or fixes
- Playwright MCP for Claude to view the browser for frontend
I still always comb through the PRs and double check everything including local testing, which is definitely the bottleneck in my dev cycles, but I'll typically have 2-4 PRs lined up ready for me at any moment.
We have a giant monorepo, hence the shallow clones. Each Claude works on its own feature / bug / ticket though, sometimes in the same part of the codebase but usually in different parts (my ralph loop has them resolve any merge conflicts automatically). I also have one Claude running just for spelunking through K8s, doing research, or asking questions about the codebase I'm unfamiliar with.
I feel like it's time for me to hang up this career. Prompting is boring, and doing it 5 times at once is just annoying multitasking. I know I'm mostly in it for the money, but at least there used to be a feeling of accomplishment sometimes. Now it's like, whose accomplishment is it?
This is the creator of a product saying how good it is.
If you've worked anywhere professionally you know how every place has its problems, where people just lie constantly about things?
Yeah.
Keep at it and see where things go.
I'm also a dev a bit overwhelmed by all of this talk, at my job I've tried quite a few things and I'm still mostly just using copilot for auto complete and very small tasks that I review throughly, everything else is manually.
If this is indeed the future I also don't wanna be a part of it and will switch to another career, but all this talk seems to come only from the people who actually built these things.
Or try to find a job where you can work how you like to work. With these things it's always "get more done ! MORE ! MORE !". But not all jobs are like this.
Agreed, the author basically says that coding is not required anymore, the job is reviewing code. Do engineers not actually want to build things themselves anymore? Where is the joy and pride in the craft? Are we just supposed to maximize productivity at the expense of our life's experience? Are we any different than machines at that point?
I feel like it’s not talked about enough that the ultimate irony of software engineering is that, as an industry, it’s aiming to make itself obsolete as much as possible. I struggle to think of any other industry that, completely on their own accord, has actively pushed to put themselves out of work to such a degree.
prompting in my experience is boring and/or frustrating. Why anyone would want to do more of that without MASSIVE financial incentives is unthinkable. No composer or writer would ever want to prompt a "work".
I tried Claude Code a while back when I decided to give "vibe-coding" a go. That was was actually quite successful, producing a little utility that I use to this day, completely without looking at the code. (Well, I did briefly glance at it after completion and it made my eyeballs melt.) I concluded the value of this to me personally was nowhere near the price I was charged so I didn't continue using it, but I was impressed nonetheless.
This brief use of Claude Code was done mostly on a train using my mobile phone's wi-fi hotspot. Since the connection would be lost whenever the train went through a tunnel, I encountered a bug in Claude Code [1]. The result of it was that whenever the connection dropped and came up again I had to edit an internal json file it used to track the state of its tool use, which had become corrupt.
The issue had been open for months then, and still is. The discussion under it is truly remarkable, and includes this comment from the devs:
> While we are always monitoring instances of this error and and looking to fix them, it's unlikely we will ever completely eliminate it due to how tricky concurrency problems are in general.
Claude Code is, in principle, a simple command-line utility. I am confident that (given the backend and model, ofc) I could implement the functionality of it that I used in (generously!) at most a few thousand lines of python or javascript, I am very confident that I could do so without introducing concurrency bugs and I am extremely confident that I could do it without messing up the design so badly that concurrency issues crop up continually and I have to admit to being powerless to fix them all.
Programming is hard, concurrency problems are tricky and I don't like to cast aspersions on other developers, but we're being told this is the future of programming and we'd better get on board or be left behind and it looks like we're being told this by people who, with presumably unlimited access to all this wonderful tooling, don't appear to be able to write decent software.
Must be nice to have unquota’ed tokens to use with frontier AI (is this the case for Anthropic employees?). One thing I think is fascinating as we enter the Intellicene is the disproportionate access to AI. The ability to petition them to do what you want is currently based on monthly subscriptions, but will it change in the future? Who knows?
It would be funny if the company paying software engineers $500K or more along with gold-plated stock options was limiting how much they could use the software their company was developing.
Why is that funny? What company gives you unlimited resources? That doesn’t scale. Google employees can’t just demand a $10,000 workstation. It’s reasonable to assume they have some guardrails, for both financial and stability reasons. Who knows… if it’s unlimited now, will it stay that way forever? Probably unlimited in the same sense as unlimited pto.
> Why is that funny? What company gives you unlimited resources?
Anthropic has raised tens of billions of dollars of funding.
Their number of employees is in the thousands. This isn't like Google.
Claude Code is what they're developing. The company is obviously going to encourage them to use it as much as possible.
Limiting how much the Claude Code lead can use Claude Code would be funny because their lead dev would have to stop mid-day and wait for his token quota window to reset before he can continue developing their flagship coding product. Not going to happen.
I'm strangely fascinated by the reaction in the comments, though. A lot of people here must have worked in oddly restrictive corporate environments to even think that a company like this would limit how much their own employees can use the company's own product to develop their own product.
I can't get a $10k workstation but if I used $10k/month on cloud compute it'd take a few months for anyone to talk to me about it and as long as I was actually using it for work purposes I wouldn't run into any consequences more severe than being told to knock it off if I couldn't convince people it was worth the cost.
Google gives most of their engineers access to machines that would cost that much. If you’re working on specific projects (e.g. Chrome) you can request even more expensive machines.
If an employee has a business need for a $10k workstation, I'm fairly certain they'll get a $10k workstation.
Yes, accounting still happens. Guardrails exist. But quibbling over 2% of a SWEs salary if it's clear that the productivity increase will be significantly more than 2% would be... not a wise use of anybody's time.
If it takes a lot of back and forth it between lots of people it is more like a $12000 workstation or more after the labor for requesting and approving.
When you work for the company supplying those tokens and you're working on the product that sells those tokens at scale, the company will let you use as many tokens as you want.
Pretty sure I have seen them imply in one of the panel discussions on their YouTube channel (can't remember which) that they get unlimited use of the best models. I remember them talking about records for the most spent in a day or something.
Pretty sure that was scientists competing for 6 month training runs of new 100B+ parameter models, not coders burning through a couple of million tokens.
It is the case that Anthropic employees have no usage limits.
Some people do experiments where they spawn up hundreds of Claude instances just to see if any of them succeed.
It would be very interesting to see the outputs of his operations. How productive is one of his agents? How long does it take to complete a task, and how often does it require steering?
I'm a bit of a skeptic. Claude Code is good, but I've had varied results during my usage. Even just 5 minutes ago, I asked CC to view the most recent commit diff using git show. Even when I provided the command, it was doing dumb shit like git show --stat and then running wc for some reason...
I've been working on something called postkit[1], which has required me to build incrementally on a codebase that started from nothing and has now grown quite a lot. As it's grown, Claude Code's performance has definitely dipped.
The funniest part of that whole thing was when someone said "I trusted you, but you use light mode on your terminal" and then he replied that people stop by his desk daily just to make fun of him for it.
like "Also, in every color combination surveyed, the darker text on a lighter background was rated more readable than its inverse (e.g. blue text on white background ranked higher then white text on blue background)"?
yes it's all preference, vision is subjective, but being surprised that dark mode isn't best is in this context... weird.
I started with dark mode because that's what amber text on a terminal was... but then the big thing was UI's simulating paper and then we had a turn to dark mode but recently I've gone back to the light side.
What I find surprising is how much human intervention the creator of Claude uses. Every time Claude does something bad we write it in claude.md so he learns from it... Why not create an agent to handle this and learn automatically from previous implementations.
B: Outcome Weighting
# memory/store.py
OUTCOME_WEIGHTS = {
RunOutcome.SUCCESS: 1.0, # Full weight
RunOutcome.PARTIAL: 0.7, # Some issues but shipped
RunOutcome.FAILED: 0.3, # Downweighted but still findable
RunOutcome.CANCELLED: 0.2, # Minimal weight
}
# Applied during scoring:
final_score = score * decay_factor * outcome_weight
C: Anti-Pattern Retrieval
# Similar features → SUCCESS/PARTIAL only
similar_features = store.search(..., outcome_filter=[SUCCESS, PARTIAL])
# Anti-patterns → FAILED only (separate section)
anti_patterns = store.search(..., outcome_filter=[FAILED])
Injected into agent prompt:
## Similar Past Features (Successful)
1. "Add rate limiting with Redis..." (Outcome: success, Score: 0.87)
## Anti-Patterns (What NOT to Do)
_These similar attempts failed - avoid these approaches:_
1. "Add rate limiting with in-memory..." (FAILED, Score: 0.72)
## Watch Out For
- **Redis connection timeout**: Set connection pool size
The flow now:
Query: "Add rate limiting"
│
├──► Similar successful features (ranked by outcome × decay × similarity)
│
├──► Failed attempts (shown as warnings)
│
└──► Agent sees both "what worked" AND "what didn't"
I'm afraid to ask, but because I've been very happy with Codex 5.2 CLI and I can't imagine Claude Code doing better, why is it Claude so loved around here?
Sure, I can spend $20 and figure it out, but I already pay $40/mo for two ChatGPT subs and that's enough to get me through a month.
I'm a late comer to AI but I started using Gemini in June 2025.
Then in december I heard from my co-workers that they were liking Claude better than any other model, and from others online, so I bought myself some Claude for xmas. And I could clearly see that it was better, right away.
That's all I know, only one model to compare with, but the difference was definitely tangible.
The Claude models are among the most expensive. It's easy to spend 30 EUR+ a day when providing it with a lot of context, documentation. Ofc it can be argued that this money is worth it relative to salaries, but recently I've switched to kilocode myself after looking at different model pricings on openrouter https://openrouter.ai/models?order=pricing-high-to-low There's just no reason to throw money away.
There are plenty of free (and also cheap ones) models you can use with just openrouter or kilocode (inexpensive less-shitty Cursor basically, https://kilocode.ai).
With most things these free models are able to achieve great results and similarly to the expensive ones they need oversight and thorough code reviews. These days I'm barely paying anything for tokens monthly.
Why are you asking this? Just try it. It takes maybe fifteen minutes of your time. It’s $20. There is no possible argument against $20 or fifteen minutes if the tool has a chance of being even just 10% better. You’ve spent more time typing by the comment and I responding than it would take to…just try it…
The main difference is that slash commands are invoked by humans, whereas skills can only be invoked by the agent itself. It works kinda as conditional instructions.
As an example, I have skills that aide in adding more detail to plans/specs, debugging, and for spinning up/partitioning subagents to execute tasks. I don't need to invoke a slash command each time, and the agent can contextually know by the instructions I give it what skills to use.
In the reddit thread Boris says they’re adding the ability to call skills via slash commands in an upcoming release and that he uses the term skill and slash commands interchangeably.
I believe slash commands are all loaded into the initial context and executed when invoked by the user. Skills on the other hand only load the name and description into initial context, and the agent (not user) determines when to invoke them, and only then is the whole skill loaded into context. So skills shift decision making to the agent and use progressive disclosure for context efficiency.
How much Codex and Claude Code are different from each other?
I have been using Codex for few weeks doing experiments related to data analysis and training models with some architecture modifications. I wouldn't say I have used it extensively, but so far my experience has been good. Only annoying part has been not able to use GPU in the Codex without using `--sandbox danger-full-access` flag. Today, I started using Claude Code, and ran similar experiments as Codex. I find the interface is quite similar to Codex. However, I hit the limit quite quickly in Claude Code. I will be exploring its features further. I would appreciate if anyone can share their experience of using both tools.
How has Claude Code (as a CLI tool, not the backing models) evolved over the last year?
For me it's practically the same, except for features that I don't need, don't work that well and are context-hungry.
Meanwhile, Claude Code still doesn't know how to jump to a dependency (library's) source to obtain factual information about it. Which is actually quite easy by hand (normally it's cd'ing into a directory or unzipping some file).
So, this wasteful workflow only resulted in vibecoded, non-core features while at the domain level, Claude Code remains overly agnostic if not stupid.
Frankly Claude code is painfully slow. To the point I get frustrated.
On large codebases I often find it taking 20+ minutes to do basic things like writing tests.
Way too often people are like it takes 2 minutes for it to do a full pr. Yeah how big is the code base actually.
I also have a coworker who is about 10x more then everyone else. Burning through credits yet he is one of the lowest performers.{closing in on around 1k worth of credits a day now).
$1,000.00 of credits per-day?? $200,000 per year? Those are bonkers numbers for someone not performing at a high level (on-top of their salary). Do you know what they are doing?
Yup. The way he works is all tasks he is issued in a sprint he just fires them through opus in parallel hoping to get a hit on Claude magically solving the ticket having them constantly be iterated on them. He doesnt even try using proper having plans be created.
Often time tickets get fleshed out or requirements change. He just throws everything out and reshoves it into Claude.
Warm take for sure, but I feel that LLMs and agents have made me a worse programmer as a whole. I am enjoying the less mental strain as I do my hobbies like sketching/art while the LLM is running. But definitely it isn't making me any faster.
I'm at the point of considering another job but I just fear that my skills have deteriorated to the point that I can't pass any (manual) coding assessments anymore
I don't understand how these setups scale longterm, and even more so for the average user. The latter is relevant because, as he points out, his setup isn't that far out of reach of the average person - it's still fairly close to out of the box claude code, and opus.
But between the model qualities varying, the pricing, the timing, the tools constantly changing, I think it's really difficult to build the institutional knowledge and setup that can be used beyond a few weeks.
In the era of AI, I don't tink it's good enough to "have" a working product. It's also important to have all the other things that make a project way more productive, like stellar documentation, better abstractions, clearer architecture. In terms of AI, there's gotta be something better than just a markdown file with random notes. Like what happens when an agent does something because it's picking something up from some random slack convo, or some minor note in a 10k claude.md file. It just seems like the wild west where basic ideas like additional surface area being a liability is ignored because we're too early in the cycle.
tl;dr If it's just pushing around typical mid-level code, then... I just think that's falling behind.
I'm a bit jealous. I would like to experiment with having a similar setup, but 10x Opus 4.5 running practically non stop must amount to a very high inference bill. Is it really worth the output?
From experimentation, I need to coach the models quite closely in order to get enough value. Letting it loose only works when I've given very specific instructions. But I'm using Codex and Clai, perhaps Claude code is better.
I have a coworker who is basically doing this right now he leads our team and is second place overall. Regularly runs opus in parallel he alone is burning through 1k worth of credits a day.
I've tried running a number of claude's in paralell on a CRUD full stack JS app. Yes, it got features made faster, yes it definitely did not leave me enough time to acutally look at what they did, yes it definitely produced sub-par code.
At the moment with one claude + manually fixing crap it produces I am faster at solving "easier" features (Think add API endpoint, re-build API client, implement frontend logic for API endpoint + UI) faster than if I write it myself.
Things that are more logic dense, it tends to produce so many errors that it's faster to solve myself.
I get some of the skepticism in this thread, but I don't get takes like this. How are you using cc that the output you look at is "full of errors"? By the time I look at the output of a session the agent has already ran linting, formatting, testing and so on. The things I look at are adherence to the conventions, files touched, libraries used, and so on. And the "error rate" on those has been steadily coming down. Especially if you also use a review loop (w/ codex since it has been the best at review lately).
You have to set these things up for success. You need loops with clear feedback. You need a project that has lots of clear things to adhere to. You need tight integrations. But once you have these things, if you're looking at "errors", you're doing something wrong IMO.
I don't think he meant like syntax errors, but thinking errors. I get these a lot with CC. Especially for example with CSS. So much useless code it produces, it blows my mind. Once I deleted 50 lines of code and manually added 4 which was enough to fix the error.
I’ve found that experienced devs use agentic coding in a more “hands-on” way than beginners and pure vibe-coders.
Vibecoders are the best because they push the models in humorous and unexpected ways.
Junior devs are like “I automated the deploy process via an agent and this markdown file”
Seasoned devs will spend more time writing the prompt for a bug fix, or lazily paste the error and then make the 1-line change themselves.
The current crop of LLMs are more powerful than any of these use cases, and it’s exciting to see experienced devs start to figure that out (I’m not stanning Gas Town[0], but it’s a glimpse of the potential).
Partially related: I really dislike the vibe of Gas Town, both the post and the tool, I really hope this isn't what the future looks like. It just feels disappointing.
I would highly recommend every project maintain a simple, well-written AGENTS.md file. At first it may seem like a more nitpicky README, but you will quickly see how much coding agents benefit from this added context. Imo, the two most important things to include in AGENTS.md are frequent commands and verification methods.
A third thing I've started adding to my projects is a list of related documentation and libraries that may not be immediately obvious. Things like confluence pages and other repos associated with the project.
I had this ShowHN yesterday, which didn't grab much attention, so i'm using this opportunity as it seems relevant (it is a solution for running CC in parallel)
if you folks like to run parallel claude-code sessions, and like native terminal like Ghostty, i have a solution for using Git Worktree natively with Ghostty, it is called agentastic.dev, and it has a built-in Worktree/IDE/Diff/Codereview around Ghostty (macos only for now).
That's just unrealistic. If i were to use it like this as an actual end user i would get stopped by rate limits/those weekly / session limits instantly
One thing that’s helped me is creating a bake-off. I’ll do it between Claude and codex. Same prompt but separate environments. They’ll both do their thing and then I’ll score them at the end. I find it helps me because frequently only one of them makes a mistake, or one of them finds an interesting solution. Then once I declare a winner I have scripts to reset the bake-off environments.
One of my side projects has been to recover a K&R C computer algebra system from the 1980's, port to modern 64-bit C. I'd have eight tabs at a time assigned files from a task server, to make passes at 60 or so files. This nearly worked; I'm paused till I can have an agent with a context window that can look at all the code at once. Or I'll attempt a fresh translation based on what I learned.
With a $200 monthly Max subscription, I would regularly stall after completing significant work, but this workflow was feasible. I tried my API key for an hour once; it taught me to laugh at the $200 as quite a deal.
I agree that Opus 4.5 is the only reasonable use of my time. We wouldn't hire some guy off the fryer line to be our CTO; coding needs best effort.
Nevertheless, I thought my setup was involved, but if Boris considers his to be vanilla ice cream then I'm drinking skim milk.
Having the 5 instances going at once sounds like Google Antigravity.
I haven't used Claude Code too much. One snag I found is the tendency when running into snags to fix them incorrectly by rolling back to older versions of things. I think it would benefit from an MCP server for say Maven Central. Likewise it should prefer to generate code using things like project scaffolding tooling whenever possible.
I can't imagine that it has some kind of special access to Anthropic's servers and that it does things an API-user can't do, maybe except for the option to use the Claude.ai credits/quota.
Even their Agent SDK just wraps the `claude`-executable, IIRC.
I spent a whole day running 3x local CC sessions and about 7 Claude code web sessions over the day. This was the most heavy usage day ever for me, about 30 pull requests created and merged over 3 projects.
I got a lot done, but my brain was fried after that. Like wired but totally exhausted.
Has anyone else experienced this and did you find strategies to help (or find that it gets easier)?
I have formal requirements for all implemented code. This is all on relatively greenfield solo developed codebases with tools I know inside out (Django, click based cli etc) so yes. Thanks so much for your concern, internet person!
This feels like the desperate, look at me! post, which is the exact opposite of Andrej Karpathy's recent tweet[0] about feeling left behind as a programmer, as covered on Hacker News[1].
I guess would want to see how sustainable this 5 parallel AI effort is, and are there demonstrably positive outcomes. There are plenty of "I one-shotted this" examples of something that already (mostly) existed, which are very impressive in their own right, but I haven't seen a lot of truly novel creations.
I wonder what sort of problems you must have to get this upset about the creator of a particular software telling people how they personally use that software
Personally I keep open several tabs of CC but it's not often that more than one or two of them would be running at the same time. It's just to keep particular context around for different parts of the same application since it's quite big (I don't use CC for creating new projects). For example if I had it work on a feature and then I realized there was a bug or an adjustment in the same files that needed to be made then I can just go back to that tab hours or maybe even days later without digging through history
> I assume "what sort of problems you must have" was directed at me.
I don't really have any sort of personal problem with Boris' post, if what your inflammatory statement was implying.
I also think it was a fairly good description of his workflow, technically speaking, but also glosses over the actual monetary costs of what he is doing, and also as noted above, doesn't really describe the actual outcomes other than a lot of PRs.
It’s really a convenience to force the model to use the planning tool and it prevents edit/write tools until the user approves the plan, like an inverse of “auto accept edits” mode.
Ao this guy is personally responsible for the RAM shortage, it seems. Jokes aside, i have a similar setup, but with a mix of claude and a local model. Claude can access the local model for simple and repetitive tasks, and it actually does a good job on testing UI. Great way to save tokens.
I have to say it sounds insane. 5 tabs of claude, back and forth from terminal to browser - and no actual workflow detailed. Are we to believe that claude is making changes in parallel to one codebase, and if so - why?
I actually use dozens of claude codes "in parallel" myself (most are sitting idle for a lot of the time though). I set up a web interface and then made it usable by others at clodhost.com if anybody wants to try it (free)!
Yeah... I had a fairly in-depth conversation with Claude a couple of days ago about Claude Code and the way it works, and usage limits, and comparison to how other AI coding tools work, and the extremely blunt advice from Claude was that Claude Code was not suitable for serious software development due to usage limits! (props to Anthropic for not sugar coating it!)
Maybe on the Max 20x plan it becomes viable, and no doubt on the Boris Cherny unlimited usage plan it does, but it seems that without very aggressive non-stop context pruning you will rapidly hit limits and the 5-hour timeout even working with a single session, let alone 5 Claude Code sessions and another 5-10 web ones!
The key to this is the way that Claude Code (the local part) works and interacts with Claude AI (the actual model, running in the cloud). Basically Claude Code maintains the context, comprising mostly of the session history, contents of source files it has accessed, and the read/write/edit tools (based on Node.js) it is providing for Claude AI. This entire context, including all files that have been read, and the tools definitions, are sent to Claude AI (eating into your token usage limit) with EVERY request, so once Claude Code has accessed a few source files then the content of those files will "silently" be sent as part of every subsequent request, regardless of what it is. Claude gave me an example of where with 3 smallish files open (a few thousand lines of code), then within 5 requests the token usage might be 80,000 or so, vs the 40,000 limit of the Pro plan or 200,000 limit of the Max 5x plan. Once you hit limit then you have to wait 5 hours for a usage reset, so without Cherny's infinite usage limit this becomes a game of hurry up and wait (make 5 requests, then wait 5 hours and make 5 more).
You can restrict what source files Claude Code has access to, to try to manage context size (e.g. in a C++ project, let it access all the .h module definition files, but block all the .cpp ones) as well as manually inspecting the context all the time to see what is being sent that can be removed. I believe there is some automatic context compaction happening periodically too, but apparently not enough to prevent many/most people hitting usage time outs when working on larger projects.
Not relevant here, but Claude also explained how Cursor manages to provide fast/cheap autocomplete using it's own models by building a vector index of the code base to only pull relevant chunks of code into the context.
Ah, thank you! Now I feel like an idiot. I guess I was thinking “ultrathink” was a specially interpreted command within claude code (sort of like a slash command).
Have others not noticed the extremely obvious astroturfing campaign specifically promoting Claude code that is mostly happening on X in recent days/weeks?
A classic hacker news post that will surely interest coders from all walks of life! ~
After regular use of an AI coding assistant for some time, I see something unusual: my biggest wins came from neither better prompts nor a smarter model. They originated from the way I operated.
At first, I thought of it as autocomplete. Afterwards, similar to a junior developer. In the end, a collaborator who requires constraints.
Here is a framework I have landed on.
First Step: Request for everything. Obtain acceleration, but lots of noise.
Stage two: Include regulations. Less Shock, More Trust.
Phase 3: Allow time for acting but don’t hesitate to perform reviews aggressively.
A few habits that made a big difference.
Specify what can be touched or come into contact with.
Asking it to explain differences before applying them.
Consider “wrong but confident” answers as signal to tighten scope.
Wondering what others see only after time.
What transformations occurred after the second or fourth week?
When was the trust increased or reduced?
What regulations do you wish you had added earlier?
The amount of people holding strong opinions on LLMs who openly admit they have not tried the state of the art tools is so high on Hacker news right now, that it's refreshing to get actual updates from the tool's creators.
I read a comment yesterday that said something like "many people tried LLMs early on, it was kind of janky and so they gave up, thinking LLMs are bad". They were probably right at the time, but the tech _has_ improved since then, while those opinions have not changed much.
So, yes claude code and sonnet/opus 4.5 is another step change that you should try out. For $20/month you can run claude code in the terminal and regular claude on the web app.
For lots of software unless you really know what you are doing it's best to just leave the default settings alone and not dig too deep into what's not immediately intended to do. For my application lots of bug reports come from people using our advanced settings without reading any of the instructions at all and screwing it up
So in the case of him being the creator obviously he built it for his needs
I'm a heavy claude code user but this is starting to smell like BS. There is nothing special in claude code, opus is a good model and with lots of requests it can give good results. There is nothing unique to it.
Absolutely shocking... Boris uses a light themed terminal?! Kidding aside, these were great tips. I am quite intrigued by the handing off of local Claude sessions to the web version. I wonder if this feature exists for the other Coding CLI agents.
I doubt he’d use Claude code as it is. I’m sure he’d upgrade to think harder and do more iterations and go deeper. Codex for example already does that but could go deeper a bit longer to figure out more.