Hacker Newsnew | past | comments | ask | show | jobs | submit | fileeditview's commentslogin

All this praise for AI.. I honestly don't get it. I have used Opus 4.5 for work and private projects. My experience is that all of the AIs struggle when the project grows. They always find some kind of local minimum where they cannot get out of but tell you this time their solution will work.. but it doesn't. They waste my time with this behaviour enormously. In the end I always have to do it myself.

Maybe when AIs are able to say: "I don't know how this works" or "This doesn't work like that at all." they will be more helpful.

What I use AIs for is searching for stuff in large codebases. Sometimes I don't know the name or the file name and describe to them what I am looking for. Or I let them generate some random task python/bash script. Or use them to find specific things in a file that a regex cannot find. Simple small tasks.

It might well be I am doing it totally wrong.. but I have yet to see a medium to large sized project with maintainable code that was generated by AI.


At what point does the project outgrow the AI in your experience? I have a 70k LOC backend/frontend/database/docker app that Claude still mostly one shots most features/tasks I throw at it. Perhaps, it's not as good remembering all the intertwined side-effects between functionalities/ui's and I have to let it know "in the calendar view, we must hide it as well", but that takes little time/effort.

Does it break down at some point to the extent that it simply does not finish tasks? Honest question as I saw this sentiment stated previously and assumed that sooner or later I'll face it myself but so far I didn't.


I find that with more complex projects (full-stack application with some 50 controllers, services, and about 90 distinct full-feature pages) it often starts writing code that simply breaks functionality.

For example, had to update some more complex code to correctly calculate a financial penalty amount. The amount is defined by law and recently received an overhaul so we had to change our implementation.

Every model we tried (and we have corporate access and legal allowance to use pretty much all of them) failed to update it correctly. Models would start changing parts of the calculation that didn't need to be updated. After saying that the specific parts shouldn't be touched and to retry, most of them would go right back to changing it again. The legal definition of the calculation logic is, surprisingly, pretty clear and we do have rigorous tests in place to ensure the calculations are correct.

Beyond that, it was frustrating trying to get the models to stick to our coding standards. Our application has developers from other teams doing work as well. We enforce a minimum standard to ensure code quality doesn't suffer and other people can take over without much issue. This standard is documented in the code itself but also explicitly written out in the repository in simple language. Even when explicitly prompting the models to stick to the standard and copy pasting it into the actual chat, it would ignore 50% of it.

The most apt comparison I can make is that of a consultant that always agrees with you to your face but when doing actual work, ignores half of your instructions and you end up running after them to try to minimize the mess and clean up you have to do. It outputs more code but it doesn't meet the standards we have. I'd genuinely be happy to offload tasks to AI so I can focus on the more interesting parts of work I have, but from my experience and that of my colleagues, its just not working out for us (yet).


I noticed that you said "models" & not "agents". Agents can receive feedback from automated QA systems, such as linters, unit, & integration tests, which can dramatically improve their work.

There's still the risk that the agent will try to modify the QA systems themselves, but that's why there will always be a human in the loop.


Should've clarified in that case. I used models as a general stand-in for AI.

To provide a bit more context: - We use VS Code (plus derivatives like Cursor) hooked up to general modals and allowing general context access to the entire repository. - We have a MCP server that has access to company internal framework and tools (especially the documentation) so it should know how they are used.

So far, we've found 2 use-cases that make AI work for us: 1. Code Review. This took quite a bit of refinement for the instructions but we've got it to a point where it provides decent comments on the things we want it to comment on. It still fails on the more complex application logic, but will consistently point out minor things. It's used now as a Pre-PR review so engineers can use it and fix things before publishing a PR. Less noise for the rest of the developers. 2. CRUD croft like tests for a controller. We still create the controller endpoint, but providing it with the controller, DTOs, and an example of how another controller has its tests done, it will produce decent code. Even then, we still often have to fix a couple of things and debug to see where it went wrong like fixing a broken test by removing the actual strictlyEquals() call.

Just keeping up with newest AI changes is hard. We all have personal curiosity but at the end of the day, we need to deliver our product and only have so much time to experiment with AI stuff. Nevermind all the other developments in our regulatory heavy environment and tech stack we need to keep on top off.


> At what point does the project outgrow the AI in your experience? I have a 70k LOC backend/frontend/database/docker app that Claude still mostly one shots most features/tasks I throw at it.

How do you do this?

Admittedly, I'm using Copilot, not CC.

I can't get Copilot to finish a refactor properly, let alone a feature. It'll miss an import rename, leave in duplicated code, update half the use cases but not all.. etc. And that's with all the relevant files in context, and letting it search the codebase so it can get more context.

It can talk about DRY, or good factoring, or SOLID, but it only applies them when it feels like it, despite what's in AGENTS.md. I have much better results when I break the task down into small chunks myself and NOT tell it the whole story.


I'm having trouble at 150k, but I'm not sure the issue is that per se, as opposed to the issue of the set of relevant context which is easy to find. The relevant part of the context threatens to bring in disparate parts of the codebase. The easy to find part determines whether a human has to manually curate the context.


I think most of us - if not _all_ of us - don't know how to use these things well yet. And that's OK. It's an entirely new paradigm. We've honed our skills and intuition based on humans building software. Humans make mistakes, sure, but humans have a degree and style of learning and failure patterns we are very familiar with. Humans understand the systems they build to a high degree, this knowledge helps them predict outcomes, and even helps them achieve the goals of their organisation _outside_ writing software.

I kinda keep saying this, but in my experience:

1. You trade the time you'd take to understand the system for time spent testing it.

2. You trade the time you'd take to think about simplifying the system (so you have less code to type) into execution (so you build more in less time).

I really don't know if these are _good_ tradeoffs yet, but it's what I observe. I think it'll take a few years until we truly understand the net effects. The feedback cycles for decisions in software development and business can be really long, several years.

I think the net effects will be positive, not negative. I also think they won't be 10x. But that's just me believing stuff, and it is relatively pointless to argue about beliefs.


> Maybe when AIs are able to say: "I don't know how this works" or "This doesn't work like that at all." they will be more helpful.

Funny you say that, I encountered this in a seemingly simple task. Opus inserted something along the lines of "// TODO: someone with flatbuffers reflection expertise should write this". I actually thought this was better than I anticipated even though the task was specifically related to fbs reflection. And it was because I didn't waste more time and could immediately start rewriting it from scratch.


So why not just merge into one and be 16 times as effective? Sorry for the sarcasm but your calculation is just a wild assumption.

How does the US do it? They have a fair amount of states too with their own laws, don't they?

Sure, federalism produces some overhead and inefficiencies. But it also has many benefits. Especially to avoid too much power in one hand but also others. E.g. you can have different school systems in different states and see what works better and adapt the other systems (if you actually do that is another question).

People are also different in different states. This also applies to Europe and its member states. Just merging all into one is just a recipe to fail epically.


Afaik, the bulk of the US' federal centralization of commerce is based on the Commerce Clause of the US Constitution [0], which based on reading (and more so on precedent) grants the US federal legislature the ability to regulate commerce between states. As most commerce crosses state boundaries, this de facto allows the federal legislature to define and enforce regulatory standards.

In practice, it's more nuanced and subject to continual back-and-forth arguing. E.g. California and Texas trying to decide their own standards, by virtue of their economic size, then hashing it out with the federal government in court.

I'm not sure what the EU regulatory cornerstone equivalent of the Commerce Clause would be.

[0] https://www.law.cornell.edu/wex/commerce_clause


> So why not just merge into one and be 16 times as effective? Sorry for the sarcasm but your calculation is just a wild assumption.

The division is on purpose, to divide power and make it harder for a second Hitler to rise again. And the calculations are no assumption, it's a common topic in Germany how much additional time and money this all costs.

> How does the US do it? They have a fair amount of states too with their own laws, don't they?

Why do you assume they are different? Or better?

> E.g. you can have different school systems in different states and see what works better

You can also have this without federalism, without maintaining a dozen different administrations which are all doing the same in different flavour.

> People are also different in different states. This also applies to Europe and its member states.

Compared to Europe, people in the USA are not that different per state. At least not on the level where individual administration is necessary. The different groups are mainly independent of the state they are living in.


That could end in an ugly stalemate pretty fast, considering ASML is Dutch.


So true! I was well into my thirties until I learned that people actually can "see" images. I was totally perplexed by this revelation. After some research I realized that this also applies to taste, smell, sounds.. and none of them I can "imagine".

In hindsight this explained a lot of things. One example would be that I always was bad at blindfold chess even though I was a decent chess player. Before, I never understood how people can do this.

Still I am absolutely fine. I can recognize all these things. I can describe them. I just can "imagine" them.

After the first shock you understand that everything has pros and cons. E.g. I never have trouble sleeping. I close my eyes and turn the world around me off. My wife can see images very vividly and always has trouble going to sleep.

In the end we just need to accept that the brain is very complex and each of us has developed / adapted the best way, allowed by our biology.


That's so funny: I also first started to realize I had aphantasia during a period when I was taking chess very seriously during university. Unlike even lesser skilled peers, it was so difficult for me to understand games written out in chess books without playing them out on the board and I couldn't understand why...

Experiences like that are how I understand the question of 'shame' relating to aphantasia and the importance of 'diagnosis'/understanding how your mind actually works. 'Diagnosis' just helps you understand how to adapt and prevents you from slamming your head against approaches that won't work no matter how hard you try.

Similarly on sleep, I can sleep anywhere anytime with little effort and always tell my wife, who often has insomnia, "just close your eyes until you sleep" to her frustration.

What's really remarkable is how similar the life experiences are of most who have aphantasia...


I don't see any images, but I have trouble sleeping because of my inner monologue or not feeling calm etc.


Interesting. So that trope of smells being strongly connected to memories never really happened to me. I didn’t know this was also part of aphantasia.


I definitely have memories linked to smell, but I can't imagine or remember and pull them up on demand, I am reminded of them when I detect that scent. I can make myself imagine/remember sourness though, but not other flavors. Just thinking of lemon, citrus, pickles, etc. makes my mouth water and start tasting sourness.


Day 2 starts with a Neovim config in vimscript. Just a heads up for people like me who switched away from Vim primarily because of vimscript in the first place.

edit: grammar


you mean lua?


The era of software mass production has begun. With many "devs" just being workers in a production line, pushing buttons, repeating the same task over and over.

The produced products however do not compare in quality to other industry's mass production lines. I wonder how long it takes until this comes all crashing down. Software mostly already is not a high quality product.. with Claude & co it just gets worse.

edit: sentence fixed.


I think you'll be waiting a while for the "crashing down". I was a kid when manufacturing went off shore and mass production went into overdrive. I remember my parents complaining about how low quality a lot of mass produced things were. Yet for decades most of what we buy is mass produced, comparatively low quality goods. We got used to it, the benefits outweighed the negatives. What we thought mattered didn't in the face of a lot of previously unaffordable goods now broadly available and affordable.

You can still buy high goods made with care when it matters to you, but that's the exception. It will be the same with software. A lot of what we use will be mass produced with AI, and even produced in realtime on the fly (in 5 years maybe?). There will be some things where we'll pay a premium for software crafted with care, but for most it won't matter because of the benefits of rapidly produced software.

We've got a glimpse of this with things like Claude Artifacts. I now have a piece of software quite unique to my needs that simply wouldn't have existed otherwise. I don't care that it's one big js file. It works and it's what I need and I got it pretty much for free. The capability of things like Artifacts will continue to grow and we'll care less and less that it wasn't human produced with care.


While a general "crashing down" probably will not happen I could imagine some differences to other mass produced goods.

Most of our private data lives in clouds now and there are already regular security nightmares of stolen passwords, photos etc. I fear that these incidents will accumulate with more and more AI generated code that is most likely not reviewed or reviewed by another AI.

Also regardless of AI I am more and more skipping cheap products in general and instead buying higher quality things. This way I buy less but what I buy doesn't (hopefully) break after a few years (or months) of use.

I see the same for software. Already before AI we were flooded with trash. I bet we could all delete at least half of the apps on our phones and nothing would be worse than before.

I am not convinced by the rosy future of instant AI-generated software but future will reveal what is to come.


I think one major lesson of the history of the internet is that very few people actually care about privacy in a holistic, structural way. People do not want their nudes, browsing history and STD results to be seen by their boss, but that desire for privacy does not translate to guarding their information from Google, their boss, or the government. And frankly this is actually quite rational overall, because Google is in fact very unlikely to leak this information to your boss, and if they did it would more likely to result in a legal payday rather than any direct social cost.

Hacker news obviously suffers from severe selection bias in this regard, but for the general public I doubt even repeated security breaches of vibe coded apps will move the needle much on the perception of LLM coded apps, which means that they will still sell, which means that it doesn't matter. I doubt even most people will pick up the connection. And frankly, most security breaches have no major consequences anyway, in the grand scheme of things. Perhaps the public conscioussness will harden a bit when it comes to uploading nudes to "CheckYourBodyFat", but the truly disastrous stuff like bank access is mostly behind 2FA layers already.


Poor quality is not synonymous with mass production. It's just cheap crap made with little care.


> The era of software mass production has begun.

We've been in that era for at least two decades now. We just only now invented the steam engine.

> I wonder how long it takes until this comes all crashing down.

At least one such artifact of craft and beauty already literally crashed two airplanes. Bad engineering is possible with and without LLMs.


There's a buge difference between possible and likely.

Maybe I'm pessimistic but I at least feel like there's a world of difference between a practice that encourages bugs and one that allows them through when there is negligence. The accountability problem needs to be addressed before we say it's like self driving cars outperforming humans. On a errors per line basis, I don't think LLMs are on par with humans yet


Knowing your system components’ various error rates and compensating for them has always been the job. This includes both the software itself and the engineers working on it.

The only difference is that there is now a new high-throughput, high-error (at least for now) component editing the software.


what is (and I'm being generous with the base here) 0.95^10? A 10-step process with a 95% success rate on each.


Yeah it’s interesting to see if blaming LLMs becomes as acceptable as “caused by a technical fault” to deflect responsibility from what is a programmer’s output.

Perhaps that’s what lead to a decline in accountability and quality.


The decline in accountability has been in progress for decades, so LLMs can obviously not have caused it.

They might of course accelerate it if used unwisely, but the solution to that is arguably to use them wisely, not to completely shun them because "think of the craft and the jobs".

And yes, in some contexts, using them wisely might well mean not using them at all. I'd just be surprised if that were a reasonable default position in many domains in 5-10 years.


> Bad engineering is possible with and without LLMs

That's obvious. It's a matter of which makes it more likely


> Bad engineering is possible with and without LLMs.

Is Good Engineering possible with LLMs? I remain skeptical.


Why didn't programmers think of stepping down from their ivory towers and start making small apps which solve small problems? That people and businesses are very happy to pay for?

But no! Programmers seem to only like working on giant scale projects, which only are of interest to huge enterprises, governments, or the open source quagmire of virtualization within virtualization within virtualization.

There's exactly one good invoicing app I've found which is good for freelancers and small businesses. While the amount of potential customers are in the tens of millions. Why aren't there at least 10 good competitors?

My impression is that programmers consider it to be below their dignity to work on simple software which solves real problems and are great for their niche. Instead it has to be big and complicated, enterprise-scale. And if they can't get a job doing that, they will pretend to have a job doing that by spending their time making open source software for enterprise-scale problems.

Instead of earning a very good living by making boutique software for paying users.


I don't think programmers are the issue here. What you describe sounds to me more like the typical product management in a company. Stuff features into the thing until it bursts of bugs and is barely maintainable.

I would love to do something like what you describe. Build a simple but solid and very specialized solution. However I am not sure there is demand or if I have the right ideas for what to do.

You mention invoicing and I think: there must be hundreds of apps for what you describe but maybe I am wrong. What is the one good app you mention? I am curious now :)


There's a whole bunch of apps for invoicing, but if you try them, you'll see that they are excessively complicated. Probably because they want to cover all bases of all use cases. Meaning they aren't great for any use case. Like you say.

The invoicing app in particular I was referring to is Cakedesk. Made by a solo developer who sells it for a fair price. Easy to use and has all the necessary functions. Probably the name and the icon is holding him back, though. As far as I understand, the app is mostly a database and an Electron/Chromium front-end, all local on your computer. Probably very simple and uninteresting for a programmer, but extremely interesting for customers who have a problem to solve.


One person's "excessively complicated" is another person's "lackluster and useless" because it doesn't have enough features.


Yes, enterprise needs more complicated setups. But why are programmers only interested in enterprise scale stuff?


I'm curious: why don't YOU create this app? 95% of a software business isn't the programming, it's the requirements gathering and marketing and all that other stuff.

Is it beneath YOUR dignity to create this? What an untapped market! You could be king!

Also it's absurd to an incredible degree to believe that any significant portion of programmers, left to their own devices, are eager to make "big, complicated, enterprise-scale" software.


What makes you think that I know how to program? It's not beyond my dignity, it's beyond my skills. The only thing I can do is support boutique programmers with my money as a consumer, and I'm very happy to do that.

But yes, sometimes I have to AI code small things, because there's no other solution.


Solving these problems requires going outside and talking to people to find out what their problems are. Most programmers aren't willing to do that.


I don't know.. maybe if you cannot control your impulses..

For me Steam sales are great. I have things in the wish list and when the sale is good I might buy it. I always check if it's a good sale on SteamDB.

I usually play these games but most of the time not for long. That's why I don't want to put in the full price.


Stupid thing with sales is that "regular" price ends up being overpriced.


Where is the win? Life is a mixture of pain, joy, success, failure and so on. Ups and down. How would you know the value of something if you never experienced (some of) the opposite?

And yes we are over connected and over sharing but everyone can change his own fate here.. just get rid of all the social networks and meet people in real life. It is possible.

I truly think that social media is one of the worst things that happened to mankind and we still have not fully grasped all the damage it does. (And here I do not refer to occasional posting in web forums or HN or such.. but the mindless, addictive participation in Facebook, Tiktok etc. where the only gain is a dopamine hit.)


I have bought the book months back and I think I recently received an email that the final version was released.

Anyways I can recommend it even though I am not finished with it yet.


Could you check what version of DWARF it covers?


It covers version 4, but it explains differences with v5 as they come up.


Okay, thanks!


Even if you have experience with DWARF, I think you will learn something new from the book.

I work on CNCF Pixie, which uses DWARF for eBPF uprobes and dynamic instrumentation. While I understood how our project uses DWARF, the book made many details much clearer for me.


I'd say the coalition is just "center" with the CDU being center-right and the SPD being center-left that seems like a good conclusion..

How did the SPD move to the right? By forming a coalition with the CDU? That claim sounds very dubious to me..


The SPD started moving to the right shortly after Schröder was elected chancellor. Their policies curbed the welfare system in a "only Nixon could go to China manner".


Schröder is long gone and especially the current SPD seems a good distance from Schröder's politics.


I see very much a continuity of the Agenda 2010 attitude within the party.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: