There’s only one solution to this problem at this point. Make AI significantly less affordable and accessible. Raise the prices of Pro / Plus / max / ultra tiers, introduce time limits, especially for minors (like screen time) when the LLM can detect age better. This will be a win-win solution: (a) people will be forced to go back to “old ways” of doing whatever it is that AI was doing it for them, (b) we won’t need as many data-centers as the AI companies are projecting today.
Coding/programming as a skill differentiator is most likely "dead" - software DEVELOPMENT will indeed live on but it will need a higher degree of well-roundedness and ownership (which also means leaner SRE/DevOps/PM/QA functions).
My message to the CTO of Honeycomb.io (who apparently wrote this post): please avoid getting philosophical and controversial to gin up curiosity about your AI platform. If you want to highlight the benefits of your platform then do so earnestly and objectively. Please don't mask marketing with an excoriation of a profession that has never been well-defined (or has always been defined to fit into an organization's political landscape for the most part). And you guys (like every other SRE/Ops platform) capitalized on that structural divide and deservedly got rich by selling licenses to these teams. I don't think you can come in now with this holier-than-thou best practice messaging just because platforms like yours have zero moat in this post-CC/Codex world.
At a scale, I don't see a net negative of AI merging "shit by itself" if the developer (or the agent) is ensuring sufficient e2e, integration and unit test coverage prior to every merge, if in return I get my team to crank out features at a 10x speed.
The reality is that probably 99.9999% of code bases on this earth (but this might drop soon, who knows) pre-date LLMs and organizing them in a way that coding agents can produce consistent results from sprint to sprint, will need a big plumbing work from all dev teams. And that will include refactoring, documentation improvements, building consensus on architectures and of course reshaping the testing landscape. So SWE's will have a lot of dirty work to do before we reach the aforementioned "scale".
However, a lot of platforms are being built from ground-up today in a post-CC (claude code) era . And they should be ready to hit that scale today.
Yup! Software engineers aren't going to be out of work anytime soon, but I'm acting more like a CTO or VPE with a team of agents now, rather than just a single dev with a smart intern.
I am not in the tech field anymore and I use exclusively free models and clis. They are mostly of Chinese origin. I call them my little software sweatshop.
I hate this paradigm because it pits me against my tools as if we're adversaries. The tools are prone to rewrite or even delete the tests, so we have to write other tools to sandbox agents from each other and check each others' work, and I just don't see a way to get deterministically good results over just building shit myself. It comes down to needing high trust in my tools to feel confident in what we're shipping.
The key is that at the end of the day productivity is king which is a polite term for cutting head count and/or delivering at a ridiculously higher velocity.
You can deterministically always get good results at your pace. But most likely, you won't achieve that at the speed and scale that a coding agent running in 4-5 worktrees, 24/7 without food or toilet breaks, especially if the latter will mostly help achieve the product/business goals at an "OK" quality (in which case you will perhaps be measured by how good you can steer these agents to elevate that quality from "OK" without sacrificing scale too much).
I think the opposite will happen - leadership will forego this attitude of "reverse course on the first outage".
Teams will figure out how to mitigate such situations in future without sacrificing the potential upside of "fully autonomous code changes made on production systems" (e.g invest more in a production-like env for test coverage).
Software engineering purists have to get out of some of these religious beliefs
> Software engineering purists have to get out of some of these religious belief
To me, the Claude superfans like yourself are the religious, like how you run around poffering unsubstantiated claims like this and believe in / anthropomorphize way too much. Is it because Anthrop'ic is an abbreviation of Anthropomorphic?
I would be in the skeptics' camp 3-4 months ago. Opus-4.5 and GPT-5.2 have changed my mind. I'm not talking about mere code completion. I am talking about these models AND the corresponding agents playing a really really capable software engineer + tester + SRE/Ops role.
The caveat is that we have to be fairly good at steering them in the right direction, as things stand today. It is exhaustive to do it the right way.
I agree the latest Gen of models, Opus 4.5 and Gemini 3 are more capable. 5.2 is OpenAI squeezing as much as they can out of 4 because they haven't had a successful pre training run since Ilya left
I disagree that they are really really capable engineers et al. They have moments where they shine like one. They also have moments where they perform worse than a new grad/hire. This is not what a really really capable engineer looks like. I don't see this fundamental changing, even with all the improvements we are seeing. It's lower level and more core than something adding more layers on top can resolve, that a only addresses best it can
In my own anecdotal experience Claude Code found a bug in production faster than I could. I was the author of the said code, that was written 4 years ago by hand. GPs claim perhaps is not all that unsubstantiated. My role is moving more towards QA/PM nowadays.
For sure. Not hard fails, but bad fixes. It confidently thought it fixed a bug, but it really didn't. I could only tell (it was fairly complex), because I tried reproducing it before/after. Ultimately I believe there was not sufficient context provided to it. It has certainly failed to do what I asked it to do in round 1, round 2, but eventually got it right (a rendering issue for a barcode designer).
These incidents have been less and less over the last year - switching it Opus made failure frequencies less. Same thing for code reviews. Most of it is fluff, but it does give useful feedback, if the instructions are good. For example, I asked for a blind code review of a PR ("Review this PR"), and it gave some generic commentary. I made the prompt more specific ("Follow the API changes across modules and see impact") - it found a serious bug.
The number of times I had to give up in frustration has been going down over the last one year. So I tend believe a swarm of agents could do a decent job of autonomous development/maintenance over the next few years.
Even lesser agents are incredibly good and incredibly fast using tools to inspect the system & come up with ideas for things to check, and checking them. I absolutely agree: we will 100% give the agents far more power. A browser, a debugger for the server that works with that browser instance, a database tool, a opentelemtry tool.
The teams are going to figure out how to mitigate bad deploys by using even more AI & giving it even better information gathering.
It's simple -- the more high-minded and snobbish the developer class will be (thus extracting the highest salaries in the world) and as long as they will continue to maintain this unreal amount of gatekeeping, the more the non-developer community (especially those at the leadership-level) will continue to revel at the prospect of eliminating developers from the value chain.
I think you're onto something. Replace "developers" with "doctors" I that statement and you've described healthcare in the mid 1900s. Replace with "masons" and we describe the medieval times. There is always a specialized class
I can't wait for indie developers to build super-agents that commoditize providers like Honeycomb.io and more importantly clone all their features and offer them up for free as OSS.
Sounds like you don't know what a nightmare of version compat and bespokeness ops/obv is. This is going to be one of the harder things for LLMs to do because everyone is running on some snowflake held together with duct tape
Fair point - my statement is more about stealing market for simpler integrations by undercutting them on price.
And I don't want to trivialize the reality of enterprise platforms where bespoke connectors rule. I have dealt with migrations of platforms that are business critical and managing version compatibility and ensuring none of the integrations regressed was par for the course. I am not even saying that that makes me qualified to replicate Honeycomb.io. But I do think someone with a deep technical background in building observability platforms armed with Claude Code or Codex and armed with the right set of MCP's and all the necessary tooling should be able to build a clone of Honeycomb.uio.
Maybe it won't be a fast turnaround like a typical vibe-coded project but even if it is a month-long project to even get to 60% feature parity. these vendors will have to sit up and pay attention.
as you immediately trivialize something it seems you know very little about
MCPs are outdated btw, it's bad to attach a bunch of MCPs in with your messages, pollutes the context. If you don't do this, you can build agents that are better than copilot/codex on gemini-3-flash. Claude Code is probably the leader here, but still definitely not capable of what you it is
I assume then you are retired or not a programmer as you are wishing for the last bastions of comoanies that pay programmers to melt with the ice sheets, leaving the desert of no paid coding work.
Just to be clear - the hook is deterministic, but the subagent running with an mcp server loaded is not - and for medium/large PRs, it can run out of context window or just forget what it is trying to do and get lazy and say 'Everything is good, ready to merge!' when in fact tests are failing or there are still unaddressed PR comments.
Sure, but that mcp still missed actionable comments that are marked as Out of Scope or Outside the PR - and this doesn't require having the context window loss of having another mcp instantiated, either. Anyway, give gtg a competitive look against the mcp - you should be able to see the difference
I have dystonia which often stiffens my arms in a way that makes it impossible for me to type on a keyboard. TTS apps like SuperWhisper have proven to be very helpful for me in such situations. I am hoping to get a similar experience out of "Handy" (very apt maming from my perspective).
I do, however, wonder if there is a way all these TTS tools can get to the next level. The generated text should not be just a verbatim copy of what I just said, but depending on the context, it should elaborate. For example, if my cursor is actively inside an editor/IDE with some code, my coding-related verbal prompts should actually generate the right/desired code in that IDE.
Perhaps this is a bit of combining TTS with computer-use.
I made something called `ultraplan`. It's is a CLI tool that records multi-modal context (audio transcription via local Whisper, screenshots, clipboard content, etc.) into a timeline that AI agents like Claude Code can consume.
I have a claude skill `/record` that runs the CLI which starts a new recording. I debug, research, etc., then say "finito" (or choose your own stopword). It outputs a markdown file with your transcribed speech interleaved with screenshots and text that you copied. You can say other keywords like "marco" and it will take a screenshot hands-free.
When the session ends, claude reads the timeline (e.g. looks at screenshots) and gets to work.
I can clean it up and push to github if anyone would get use out of it.
I totally agree with you and largely what you’re describing is one of the reasons I made Handy open source. I really want to see something like this and see someone go experiment with making it happen. I did hear some people playing with using some small local models (moondream, qwen) to get some more context of the computer itself
I initially had a ton of keyboard shortcuts in handy for myself when I had a broken finger and was in a cast. It let me play with the simplest form of this contextual thing, as shortcuts could effectively be mapped to certain apps with very clear uses cases
What you said is possible by feeding the output of speech-to-text tools into an LLM. You can prompt the LLM to make sense of what you're trying to achieve and create sets of actions. With a CLI it’s trivial, you can have your verbal command translated into working shell commands. With a GUI it’s slightly more complicated because the LLM agent needs to know what you see on the screen, etc.
That CLI bit I mentioned earlier is already possible. For instance, on macOS there’s an app called MacWhisper that can send dictation output to an OpenAI‑compatible endpoint.
I was just thinking about building something like this, looks like you beat me to the punch, I will have to try it out.
I'm curious if you're able to give commands just as well as some wording you want cleaned up. I could see a model being confused between editting the command input into text to be inserted and responding to the command. Sorry if that's unclear, might be better if I just try it.
reply