It’s a bit strange how anecdotes have become acceptable fuel for 1000 comment technical debates.
I’ve always liked the quote that sufficiently advanced tech looks like magic, but its mistake to assume that things that look like magic also share other properties of magic. They don’t.
Software engineering spans over several distinct skills: forming logical plans, encoding them in machine executable form(coding), making them readable and expandable by other humans(to scale engineering), and constantly navigating tradeoffs like performance, maintainability and org constraints as requirements evolve.
LLMs are very good at some of these, especially instruction following within well known methodologies. That’s real progress, and it will be productized sooner than later, having concrete usecases, ROI and clearly defined end user.
Yet, I’d love to see less discussion driven by anecdotes and more discussion about productizing these tools, where they work, usage methodologies, missing tooling, KPIs for specific usecases. And don’t get me started on current evaluation frameworks, they become increasingly irrelevant once models are good enough at instruction following.
> It’s a bit strange how anecdotes have become acceptable fuel for 1000 comment technical debates.
Progress is so fast right now anecdotes are sometimes more interesting than proper benchmarks. "Wow it can do impressive thing X" is more interesting to me than a 4% gain on SWE Verified Bench.
In early days of a startup "this one user is spending 50 hours/week in our tool" is sometimes more interesting than global metrics like average time in app. In the early/fast days, the potential is more interesting than the current state. There's work to be done to make that one user's experience apply to everyone, but knowing that it can work is still a huge milestone.
At this point I believe the anecdotes more than benchmarks, cause I know the LLM devs train the damn things on the benchmarks.
A benchmark? probably was gamed. A guy made an app to right click and convert an image? prolly true, have to assume it may have a lot of issues but prima facie I just make a mental note that this is possible now.
> It’s a bit strange how anecdotes have become acceptable fuel for 1000 comment technical debates.
It's a very subjective topic. Some people claim it increases their productivity 100x. Some think it is not fit for purpose. Some think it is dangerous. Some think it's unethical.
Weirdly those could all be true at the same time, and where you land on this is purely a matter of importance to the user.
> Yet, I’d love to see less discussion driven by anecdotes and more discussion about productizing these tools, where they work, usage methodologies, missing tooling, KPIs for specific usecases. And don’t get me started on current evaluation frameworks, they become increasingly irrelevant once models are good enough at instruction following.
I agree. I've said earlier that I just want these AI companies to release an 8-hour video of one person using these tools to build something extremely challenging. Start to finish. How do they use it, how does the tool really work. What's the best approaches. I am not interested in 5-minute demo videos producing react fluff or any other boiler plate machine.
I think the open secret is that these 'models' are not much faster than a truly competent engineer. And what's dangerous is that it is empowering people to 'write' software they don't understand. We're starting to see the AI companies reflect this in their marketing, saying tech debt is a good thing if you move fast enough....
This must be why my 8-core corporate PC can barely run teams and a web browser in 2026.
How many 1+ hour videos of someone building with AI tools have you sought out and watched? Those definitely exist, it sounds like you didn't go seeking them out or watch them because even with 7 less hours you'd better understand where they add value enough to believe they can help with challenging projects.
So why should anybody produce an 8 hour video for you when you wouldn't watch it? Let's be real. You would not watch that video.
In my opinion most of the people who refuse to believe AI can help them while work with software are just incurious/archetypical late adopters.
If you've ever interacted with these kinds of users, even though they might ask for specs/more resources/more demos and case studies or maturity or whatever, you know that really they are just change-resistant and will probably continue to be as as long as they can get away with it being framed as skepticism rather than simply being out of touch.
I don't mean that in a moralizing sense btw - I think it is a natural part of aging and gaining experience, shifting priorities, being burned too many times. A lot of business owners 30 years ago probably truly didn't need to "learn that email thing", because learning it would have required more of a time investment than it would yield, due to being later in their career with less time for it to payoff, and having already built skills/habits/processes around physical mail that would become obsolete with virtual mail. But a lot of them did end up learning that email thing 5, 10, whatever years later when the benefits were more obvious and the rest of the world had already reoriented itself around email. Even if they still didn't want to, they'd risk looking like a fossil/"too old" to adapt to changes in the workplace if they didn't just do it.
That's why you're seeing so many directors/middle managers doing all these though leader posts about AI recently. Lots of these guys 1-2 years ago were either saying AI is spicy autocomplete or "our OKR this quarter is to Do AI Things". Now they can't get away with phoning it in anymore and need to prove to their boss that they are capable of understanding and using AI, the same way they had to prove that they understood cloud by writing about kubernetes or microservices or whatever 5-10 years ago.
> In my opinion most of the people who refuse to believe AI can help them while work with software are just incurious/archetypical late adopters.
The biggest blocker I see to having AI help us be more productive is that it transforms how the day to day operations work.
Right now there is some balance in the pipeline of receiving change requests/enhancements, documenting them, estimating implementation time, analyzing cost and benefits, breaking out the feature into discrete stories, having the teams review the stories and 'vote' on a point sizing, planning on when each feature should be completed given the teams current capacity and committing to the releases (PI Planning), and then actually implementing the changes being requested.
However if I can take a code base and enter in a high level feature request from the stakeholders and then hold hands with Kiro to produce a functioning implementation in a day, then the majority of those steps above are just wasting time. Spending a few hundred man-hours to prepare for work that takes a few hundred man-hours might be reasonable, but doing that same prep work for a task that takes 8 man-hours isn't.
And we can't shift to that faster workflow without significant changes to entire software pipeline. The entire PMO team dedicated to reporting when things will be done shifts if that 'thing' is done before the report to the PMO lead is finished being created. Or we need significantly more resources dedicated to planning enhancements so that we could have an actual backlog of work for the developers. But my company appears to neither be interested in shrinking the PMO team nor in expanding the intake staff.
It could be really beneficial for Anthropic to showcase how they use their own product; since they're developers already, they're probably dogfooding their product, and the effort required should be minimal.
- A lot of skeptics have complained that AI companies aren't specific about how they use their products, and this would be a great example of specificity.
- It could serve as a tutorial for people who are unfamiliar with coding agents.
- The video might not convince people who have already made up their minds, but at least you could point to it as a primary source of information.
These exist. Just now I triedfinding such a video for a medium-sized contemporary AI devtools product (Mastra) and it took me only a few seconds to arrive at https://www.youtube.com/watch?v=fWmSWSg848Q
There could be a million of these videos and it wouldn't matter, the problem is incuriosity/resistance/change-aversion. It's why so many people write comments complaining about these videos not existing without spending even a single minute looking for them: they wouldn't watch these videos even if they existed. In fact, they assume/assert they don't exist without even looking for them because they don't want them to exist: it's their excuse for not doing something they don't want to do.
That video was completely useless for me. I didn't see a single thing I would consider programming. I don't want to waste time building workflows or agentic agents, I want to see them being used to solve real world difficult problems from start to finish.
> How many 1+ hour videos of someone building with AI tools have you sought out and watched?
A lot, they've mostly all been advertising trite and completely useless.
I don't want a demonstration of what a jet-powered hammer is by the sales person or how to oil it, or mindless fluff about how much time it will save me hammering things. I want to see a journeyman use a jet-powered hammer to build a log cabin.
I am personally not seeing this magic utopia. No one wants to show me it, they just want to talk about how better it is.
I can only speak for myself, but it feels like playing with fire to productize this stuff too quick.
Like, I woke up one day and a magical owl told me that I was a wizard. Now I control the elements with a flick of my wrist - which I love. I can weave the ether into databases, apps, scripts, tools, all by chanting a simple magical invocation. I create and destroy with a subtle murmur.
Do I want to share that power? Naturally, it would be lonely to hoard it and despite the troubles at the Unseen University, I think that schools of wizards sharing incantations can be a powerful good. But do I want to share it with everybody? That feels dangerous.
It's like the early internet - having a technical shelf to climb up before you can use the thing creates a kind of natural filter for at least the kinds of people that care enough to think about what they're doing and why. Selecting for curiosity at the very least.
That said, I'm also interested in more data from an engineering perspective. It's not a simple thing and my mind is very much straddling the crevasse here.
LLMs are lossy compression of a corpus with a really good parser as a front end. As human made content dries up (due to LLM use), the AI products will plateau.
I see inference as the much bigger technology although much better RAG loops for local customization could be a very lucrative product for a few years.
We should get back to the basic definition of the engineering job. An engineer understands requirements, translates them into logical flows that can be automated, communicates tradeoffs across the organization, and makes tradeoff calls on maintainability, extensibility, readability, and security. Most importantly, they’re accountable for the outcome, because many tradeoffs only reveal their cost once they hit reality
None of this is covered by code generation, nor by juniors submitting random PRs. Those are symptoms of juniors (not only) missing fundamentals.
When we forget what the job actually is, we create misalignment with junior engineers and end up with weird ideas like "spec-driven development"
If anything, coding agents are a wake-up call that clarify what engineering profession is really about
When 10K LOC AI PR's are being created, sometimes by people who either don't understand the code or haven't reviewed the code their trying to submit; the 60 million dollar failure line is potentially lying in wait.
The whole reliability, etc. to many is not of much priority. Things got an absolutely shitshow and still everyone buys it.
In other words the only outcome will be that people don't have or don't want to have engineers anymore.
Companies are very much not interested in someone who does the above, but at most someone who sells or cosplays these things - if even.
Cause that what creates income. They don't care if they sell crap, they care that they sell it and the cheaper they can produce the better. So money gets poured into marketing not quality.
High quality products are not sought after. And fake quality like putting a computer or a phone in a box like jewelry, even if you throw that very box away the next time you walk by a trash bin. That's what people consider quality these days, even if it's just a waste of resources.
And businesses choose products and services the same way as regular consumers, even when they want the marketing to make them feel good about it in a slightly different way, because marketing to your target audience makes sense. Duh!
People are ready to pay more for having the premium label stamped on to something, pay more to feel good about it, but most of the time are very unwilling to pay for measurable quality, an engineer provides.
It's scary, even with infrastructure the process seems to change, probably also due to corruption, but that's a whole other can of worms.
> communicates tradeoffs across the organization
They may do that. They may be recognized for it. But if the guy next door with the right cosplay says something like "we are professionals, look at how we have been on the market for X years" or "look at our market share" then no matter how far from reality the bullshitting is they'll be getting the money.
At the beginning of the year/end of last year I learned how little expertise, professionalism and engineering are required to be a multi billion NASDAQ stock. For months I thought that it cannot possibly be, that the core product of a such a company displays such a complete lack of expertise in the core area(s). Yet, they somehow managed to convince management to just invest a couple more times of money than the original budget that was already seen as quite the stretch. Of course they promises didn't end being anywhere close to true, and they completely forgot to inform us (our management) about severe limitations.
So if you are good at selling to management which you can be by pocketing consultants recommending you then things will work seemingly no matter what.
> If anything, coding agents are a wake-up call that clarify what engineering profession is really about
I believe what we need to wake up to or come to terms with is that our industry (everything that would go into NASDAQ) is a farce. Coding agents show that. It doesn't matter to create half-assed products if you come to sell them. You are selling your products to people. Doesn't matter if it's some guy at a hot dog stand or a CEO of a big successful company or going from house to house selling the best vacuum cleaner ever. What matters is you making people believe it would be stupid not to take your product.
TBH I think Information Systems Engineering and Computer Engineering can just eat software engineers lunch at this point. the entire need for a separate engineering discipline on software was for low level coding. Custom hardware chips are easier to make for simple things and not a lot of need in low level coding anymore for more complex things means the focus is shifting back to either hardware choices or higher level system management
I'd argue the only places left you really need low level coding fall under computer science. If you are a computer or systems engineer who needs to work with a lot of code then youll benefit from having exposure to computer science, but an actual engineering discipline for just software seems silly now. Not to mention pretty much all engineers at this point are configuring software tools on their own to some degree
I think it's similar to how there used to be horse doctors as a separate profession from vets when horses were much more prominent in everyday life, but now they are all vets again and some of them specialize in horses
> I believe what we need to wake up to or come to terms with is that our industry (everything that would go into NASDAQ) is a farce.
the thing is, with software development, it's always been this way. Developers have just had tunnel vision for decades because they stare into an editor all day long instead trying to actually sell a product. If selling wasn't the top priority then what do you think would happen to your direct deposit? Software developers, especially software developers, live in this fantasy land where the believe their paycheck just happens automatically and always will. I think it's becoming critical that new software devs entering the workforce spend a couple years at a small, eat what you kill, consultancy or small business. Somewhere where they can see the relationship between building, selling, and their paycheck first hand.
Technology has absolute qualities. Not a fantasy.
Are you being paid to browse hacker news? Probl not, but here you are.
Maybe you never considered this, but programming for other reasons other than a salary is a possibility.
If those pesky programmers gave it all away, for free, what would be left for you to sell? In this case, would you leave technology? Would you go somewhere else and practice your selling there?
Can't we defend building for the sake of building?
Doing for the sake of having fun?
Maybe you would be left with nothing to sell, I understand, but that's fine for me. Sorry.
How do you square that with "use AI and get this feature done in three days or have your 'performance reviewed' with HR in the room"? Because I'm having trouble bridging that gap.
Edit: help, the new org said the same thing. :(
Edit 2: you guys, seriously, the HR lady keeps looking up at me and shaking her head. I don't think this is good. I tried to be a real, bigboy engineer, but they just mumbled something about KPIs and put me on a PIP.
I think people are getting used to stuff not working. People (like me) use crap like Teams, Slack, that web version of Office, Outlook, etc. on a daily basis and pour huge amounts money in. They use shit like Fortinet (the digital version of dream catchers) and so on.
Things break. A lot. Doctors successful or not also deal with the same shitty IT on a daily basis.
Nobody cares about engineering. It's about selling stuff, not about reliability, etc.
And to some degree one is forced to use that stuff anyways.
So sure you can go to a company understanding engineering, but if you do a job for salary you might lose out on quite a bit on it if you care for things like quality. We see this in so many different sectors.
Sure there is a unicorn here and there that makes it for a while. And they might even go big and then they sell the company or change to maximizing profits, because that's the only way up when you essentially already made it (on top of one of the big players).
For small projects/companies it depends if you have a way to still launch big, which you can usually do with enough capital. You can still make a big profit with a crappy product then, but essentially only once or twice. But then your goal also doesn't have to create quality.
Microsoft and Fortinet for example wouldn't profit from adding (much) quality. They profit from hypes. So they now both do "AI".
Yup, we are all definitely lowering the bar of what's acceptable when it comes to uptime and bugs. More features more hype x10 seems to be the standard approach to market, but there are still a lot of companies and teams where greybeards and rational folks remember and understand previous hype cycles/bubbles, and who appreciate and protect the engineering approach. It's just that they mostly hire/partner by reference, so it's kinda hard to exit the toxic bubble of startups and "growth hacking" enterprises.
Skills are a pretty awkward abstraction. They emerged to patch a real problem, generic models require fine-tuning via context, which quickly leads to bloated context files and context dilution (ie more hallucinations)
But skills dont really solve the problem. Turning that workaround into a standard feels strange. Standardizing a patch isn’t something I’d expect from Anthropic, it’s unclear what is their endgame here
Skills don’t solve the problem if you think an llm should know everything. But if you see LLMs mostly as a text plan-do-check-act machine that can process input text, generate output text, and can create plans how to create more knowledge and validate the output, without knowing everything upfront, skills are perfectly fine solution.
The value of standardizing skills is that the skills you define work with any agentic tool. Doesn't matter how simple they are, if they dont work easily, they have no use.
You need a practical and efficient way to give the llm your context. Just like every organization has its own standards, best practices, architectures that should be documented, as new developers do not know this upfront, LLMs also need your context.
An llm is not an all knowing brain, but it’s a plan-do-check-act text processing machine.
How would you solve the same problem? Skills seem to be just a pattern (before this spec) that lets the LLMs choose what information they need to "load". It's not that different from a person looking up the literature before they do a certain job, rather than just reading every book every time in case it comes in handy one day. Whatever you do you will end up with the same kind of solution, there's no way to just add all useful context to the LLM beforehand.
Marketing. That defines pretty much everything Anthropic does beyond frontier model training. They're the same people producing sensationalized research headlines about LLMs trying to blackmail folks in order to prevent being deleted.
> Standardizing a patch isn’t something I’d expect from Anthropic
This is not the first time, perhaps expectation adjustment is in order. This is also the same company that has an exec telling people in his Discord (15m of fame recently) Claude has emotions
LLMs get over-analyzed. They’re predictive text models trained to match patterns in their data, statistical algorithms, not brains, not systems with “psychology” in any human sense.
Agents, however, are products. They should have clear UX boundaries: show what context they’re using, communicate uncertainty, validate outputs where possible, and expose performance so users can understand when and why they fail.
IMO the real issue is that raw, general-purpose models were released directly to consumers. That normalized under-specified consumer products, created the expectation that users would interpret model behavior, define their own success criteria, and manually handle edge cases, sometimes with severe real world consequences.
I’m sure the market will fix itself with time, but I hope more people would know when not to use these half baked AGI “products”
because they wanted to sell the illusion of consciousness, chatgpt, gemini and claude are humans simulator which is lame, I want autocomplete prediction not this personality and retention stuff which only makes the agents dumber.
You hit the nail on the head. Anyone who's been working intimately with LLM's comes to the same conclusion. the llm itself is only one small important part that is to be used in a more complicated and capable system. And that system will not have the same limitations as the raw llm itself.
To say they LLMs are 'predictive text models trained to match patterns in their data, statistical algorithms, not brains, not systems with “psychology” in any human sense.' is not entirely accurate. Classic LLMs like GPT 3 , sure. But LLM-powered chatbots (ChatGPT, Claude - which is what this article is really about) go through much more than just predict-next-token training (RLHF, presumably now reasoning training, who knows what else).
> go through much more than just predict-next-token training (RLHF, presumably now reasoning training, who knows what else).
Yep, but...
> To say they LLMs are 'predictive text models trained to match patterns in their data, statistical algorithms, not brains, not systems with “psychology” in any human sense.' is not entirely accurate.
That's a logical leap, and you'd need to bridge the gap between "more than next-token prediction" to similarity to wetware brains and "systems with psychology".
Sure, but they reflect all known human psychology because they’ve been trained on our writing. Look up the anthropic tests.
If you make an agent based on an LLM it will display very human behaviors including aggressive attempts to prevent being shut down.
Is the solution to sycophancy just a very good clever prompt that forces logical reasoning? Do we want our LLMs to be scientifically accurate or truthful or be creative and exploratory in nature? Fuzzy systems like LLMs will always have these kinds of tradeoffs and there should be a better UI and accessible "traits" (devil's advocate, therapist, expert doctor, finance advisor) that one can invoke.
> LLMs get over-analyzed. They’re predictive text models trained to match patterns in their data, statistical algorithms, not brains, not systems with “psychology” in any human sense.
Per the predictive processing theory of mind, human brains are similarly predictive machines. "Psychology" is an emergent property.
I think it's overly dismissive to point to the fundamentals being simple, i.e. that it's a token prediction algorithm, when it's clear to everyone that it's the unexpected emergent properties of LLMs that everyone is interested in.
Predictive processing is absolutely not garbage. The dish of neurons that was trained to play Pong was trained using a method that was directly based on the principles of predictive processing. Also I don't think there's really any competitor for the niche predictive processing is filling, and for closing the gap between neuroscience and psychology.
The difference is that we know how LLMs work. We know exactly what they process, how they process it, and for what purpose. Our inability to explain and predict their behavior is due to the mind-boggling amount of data and processing complexity that no human can comprehend.
In contrast, we know very little about human brains. We know how they work at a fundamental level, and we have vague understanding of brain regions and their functions, but we have little knowledge of how the complex behavior we observe actually works. The complexity is also orders of magnitude greater than what we can model with current technology, but it's very much an open question whether our current deep learning architectures are even the right approach to model this complexity.
So, sure, emergent behavior is neat and interesting, but just because we can't intuitively understand a system, doesn't mean that we're on the right track to model human intelligence. After all, we find the patterns of the Game of Life interesting, yet the rules for such a system are very simple. LLMs are similar, only far more complex. We find the patterns they generate interesting, and potentially very useful, but anthropomorphizing this technology, or thinking that we have invented "intelligence", is wishful thinking and hubris. Especially since we struggle with defining that word to begin with.
I think what comment-OP above means to point at is - given what we know (or, lack thereof) about awareness, consciousness, intelligence, and the likes, let alone the human experience of it all, today, we do not have a way to scientifically rule out the possibility that LLMs aren't potentially self-aware/conscious entities of their own; even before we start arguing about their "intelligence", whatever that may be understood of as.
What we do know and have so far, across and cross disciplines, and also from the fact that neural nets are modeled after what we've learned about the human brain, is, it isn't an impossibility to propose that LLMs _could_ be more than just "token prediction machines". There can be 10000 ways of arguing how they are indeed simply that, but there also are a few of ways of arguing that they could be more than what they seem. We can talk about probabilities, but not make a definitive case one way or the other yet, scientifically speaking. That's worth not ignoring or dismissing the few.
Is this what we are reduced to now, to snap back with a wannabe-witty remark just because you don't like how an idea sounds? Have we completely forgotten and given up on good-faith scientific discourse? Even on HN?
I'm happy to participate in good faith discourse but honestly the idea that LLMs are conscious is ridiculous.
We are talking about a computer program. It does nothing until it is invoked with an input and then it produces a deterministic output unless provided a random component to prevent determinism.
That's all it does. It does not live a life of its own between invocations. It does not have a will of its own. Of course it isn't conscious lol how could anyone possibly believe it's conscious? It's an illusion. Don't be fooled.
Reading what you said literally, you're making a strong statement that an AI could never be conscious and further that consciousness depends on free will and that free will is incompatible with determinism and that all of these statements are obviously self-evident.
But the problem is the narrative around this tech. It is marketed as if we have accomplished a major breakthrough in modeling intelligence. Companies are built on illusions and promises that AGI is right around the corner. The public is being deluded into thinking that the current tech will cure diseases, solve world hunger, and bring worldwide prosperity. When all we have achieved is to throw large amounts of data at a statistical trick, which sometimes produces interesting patterns. Which isn't to say that this isn't and can't be useful, but this is a far cry from what is being suggested.
> We can talk about probabilities, but not make a definitive case one way or the other yet, scientifically speaking.
Precisely. But the burden of proof is on the author. They're telling us this is "intelligence", and because the term is so loosely defined, this can't be challenged in either direction. It would be more scientifically honest and accurate to describe what the tech actually is and does, instead of ascribing human-like qualities to it. But that won't make anyone much money, so here we are.
At no point did I say LLMs have human intelligence nor that they model human intelligence. I also didn't say that they are the correct path towards it, though the truth is we don't know.
The point is that one could similarly be dismissive of human brains, saying they're prediction machines built on basic blocks of neuro chemistry and such a view would be asinine.
A large part of that training is done by asking people if responses 'look right'.
It turns out that people are more likely to think a model is good when it kisses their ass than if it has a terrible personality. This is arguably a design flaw of the human brain.
I’ve always liked the quote that sufficiently advanced tech looks like magic, but its mistake to assume that things that look like magic also share other properties of magic. They don’t.
Software engineering spans over several distinct skills: forming logical plans, encoding them in machine executable form(coding), making them readable and expandable by other humans(to scale engineering), and constantly navigating tradeoffs like performance, maintainability and org constraints as requirements evolve.
LLMs are very good at some of these, especially instruction following within well known methodologies. That’s real progress, and it will be productized sooner than later, having concrete usecases, ROI and clearly defined end user.
Yet, I’d love to see less discussion driven by anecdotes and more discussion about productizing these tools, where they work, usage methodologies, missing tooling, KPIs for specific usecases. And don’t get me started on current evaluation frameworks, they become increasingly irrelevant once models are good enough at instruction following.