Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think this is an easy thing to wrap my mind around (since I have been in both camps):

AI can generate lots of code very quickly.

AI does not generate code that follows taste and or best practices.

So in cases where the task is small, easily plannable, within the training corpus, or for a project that doesn't have high stakes it can produce something workable quickly.

In larger projects or something that needs maintainability for the future code generation can fall apart or produce subpar results.



I've just seen that change happen in the a pet project within less than 10 hours of work.

I tried vibe-coding something for my own use, your classic "scratch your own itch" project.

The first MVP was a one-shot success, really impressive.

But as the code grew with every added feature, progress soon slowed down. The AI claimed to have fixed a bug when it didn't. It switched chest several timea back and forth between using a library function and rolling its own implementation, each time claiming to have "simplified" the code and made it "more reliable". With every change, it claimed to have "improved" the code, even when it just added a bunch of shamelessly duplicated shit.

One effect I am sure AI will have is to massively excarbate the phenomenon of people who quickly produce a large amount of shitty, unmaintainable code that fulfills half the requirements, and then leave the mess behind for another greenfield project.


We’re so unaccustomed to working with non-deterministic computer tech that rather than acknowledge they are hit-or-miss, everyone just picks one side and goes all-in on it.

Which sounds an awful lot like politics.


> We’re so unaccustomed to working with non-deterministic computer tech

The whole history of computer history is about making computing deterministic, after we found out that having generative anything (Recursion, Context-Free grammar,...) is a double-edged sword. So for any known set of inputs, you want the output to be finite and non-zero, and all items having the correct properties.


All of physics was like that until QM


"non-deterministic" is a weird way to say "mangles data 1/3 to 1/2 the time"


LLMs also don’t always generate code that works.

And you don’t always know or understand what the product owner wants you to build.

Writing code faster is very rarely what you need.


That about sums it up from my experience as well. But as parent said, such takes unfortunately don't get a lot of eyeballs :(


That was four short paragraphs with basically no details - there's basically nothing for eyeballs to see. If I wrote a history of the world that consisted of "Some people lived and died, some of them were bad, I guess", how many copies do you think I'd sell? Any? What's interesting is the details, and a post giving actual detail of building some app, and the benefits and shortcomings of a specific tool would be of great interest to many. If I ask an LLM to save the date to the database, and then ask it to save the time, do I get two variables in two columns? Does that make sense for my app? Do I have to ask it to refactor? Is it able to do that successfully? Does it drop tables during program initialization? How does the latest model do with designing the entire program? How often does it hallucinate for this set of libraries and prompts? There's quite a bit of variance! If it hallucinated libraries and APIs left and right it would be far less useful. Some people don't even get hallucinations because their prompts are so we'll trod. There are all sorts of interesting details to be learned and shared about these new tools that would get a ton of eyeballs.


Recent LLMs are fairly good at following instructions, so a lot of the difference comes down to the level of detail and quality of the instructions given. Written communication is a skill for which there's a huge amount of variance among developers, so it's not surprising that different developers get very different results. The relative quality of the LLM's output is determined primarily by the written communication skills of the individuals instructing it.


It seems to me If you know all these instructions clearly, then you know everything, and it's easy for you to write the code yourself, and you don't need an LLM.


The amount of typing it takes to describe a solution in English text is often less than the amount of typing needed to actually implement it in code, especially after accounting for boilerplate and unit tests. Not to mention the time spent waiting for the compiler and test harness to run. As a concrete example, the HTTP2.0 spec is way fewer chars long than any HTTP2.0 server implementation, and the C spec is way fewer chars long than any compliant C compiler. The C++ spec is way, way fewer chars long than any compliant C++ compiler.


>The amount of typing it takes to describe a solution in English text is often less than the amount of typing needed to actually implement it in code

I don't find this to be true. I find describing a solution in English well to be slower than describing the problem in code (IE, by writing tests first) and having that be the structured data that the LLM uses to generate code.

Its far faster, from the results I'm seeing plus my own personal experience, to write clear tests which benefit from being a form of structured data that the LLM can analyze. Its the guidance we have given to our engineers at my day job and it has made working with these tools dramatically easier.

In some cases, I have found LLM performance to be subpar enough that it is indeed, faster to write it myself. If it has to hold many different pieces of information together, it starts to falter.


I don't think it's so clear-cut. The C spec I found is 4MB and the tcc compiler source code is 1.8MB. It might need some more code to be fully compliant, but it may still be smaller than 4MB. I think the main reason why code bases are much larger is because they contain stuff not covered by the spec (optimization, vendor-specific stuff, etc etc).

Personally I'd rather write a compiler than a specification, but to each their own.


Use shorter variable names


Corpus also matters. I know Rust developers who aren't getting very good results even with high quality prompts.

On the other hand, I helped integrate Cursor as a staff engineer at my current job for all our developers (many hundreds), who primarily work in JavaScript / TypeScript, and even middling prompts will get results that only require refactoring, assuming the LLM doesn't need a ton of context for the code generation (e.g. greenfield or independent features).

Our general approach and guidance has been that developers need to write the tests first and have Cursor use that as a basis for what code to generate. This helps prevent atrophy and over time we've find thats where developers add the most value with these tools. I know plenty of developers want to do it the other way (have AI generate the tests) but we've had more issues with that approach.

We discourage AI generating everything and having a human edit the output, as it tends to be slower than our chosen approach and more likely to have issues.

That said, LLMs still struggle if they need to hold alot of context. For instance, if you have a bunch of files that it needs to understand to also generate code that is worthwhile, particularly if you want it to re-use code.


>Corpus also matters. I know Rust developers who aren't getting very good results even with high quality prompts.

Which model were they using, out of interest? I've gotten decent results for Rust from Gemini 2.5 Pro. Its first attempt will often be disgusting (cloning and other inefficiencies everywhere), but it can be prompted to optimise that afterwards. It also helps a lot to think ahead about lifetimes and explicitly tell it how to structure them, if there might be anything tricky lifetime-wise.


No idea. I do know they all have access to Cursor and tried different models, even the more expensive options.

What you're describing though, having to go through that elaborate detail really drives to my point though, and I think shows a weakness in these tools that is a hidden cost to scaling their productivity benefits.

What I can tell you though both from observation and experience, is that because the corpus for TypeScript / JavaScript is infinitely larger as it stands today, even Gemini 2.5 Pro will 'get to correct' faster even with middling prompt(s) vs for a language like Rust.


I do a lot of work in a rather obscure technology (Kamailio) with an embedded domain-specific scripting language (C-style) that was invented in the early 2000s specifically for that purpose, and can corroborate this.

Although the training data set is not wholly bereft of Kamailio configurations, it's not well-represented, and it would be at least a few orders of magnitude smaller than any mainstream programming language. I've essentially never had it spit out anything faintly useful or complete Kamailio-wise, and LLM guidance on Kamailio issues is at least 50% hallucinations / smoking crack.

This is irrespective of prompt quality; I've been working with Kamailio since 2006 and have always enjoyed writing, so you can count on me to formulate a prompt that is both comprehensive and intricately specific. Regardless, it's often a GPT-2 level experience, or akin to running some heavily quantised 3bn parameter local Llama that doesn't actually know much of anything specific.

From this one, can conclude that a tremendous amount of reinforcement for the weights is needed before the LLM can produce useful results in anything that isn't quasi-universal.

I do think, from a labour-political perspective, that this will lead to some guarding and fencing to try to prevent one's work-product from functioning as free training for LLMs that the financial classes intend to use to displace you. I've speculated before that this will probably harm the culture of open-source, as there will now be a tension between maximal openness and digital serfdom to the LLM companies. I can easily see myself saying:

I know our next commercial product (based on open-source inputs) releases, which are on-premise for various regulatory and security reasons, will be binary-only; I have never customers looking through our plain-text scripts before, but I don't want them fed into LLMs for experiments with AI slop.


Yea this! How many devs say "it doesn't do what i expect" did not try to write up a plan of action before it just YOLO'd some new features? We have to learn to use this new tool, but how to do that is still changing all the time.


> We have to learn to use this new tool, but how to do that is still changing all the time.

so we need to program in natural language now but targeting some subset of it that is no learnable?

and this is better how?


Yes, and also writing for an LLM to consume is its own skill with nuances that are in flux as models and tooling improves.


You're not wrong, in July 2025. But it will get better, and it will not stop getting better when it reaches equal-to-the-best-human level.


No, but I think you're wrong. My child grew 3 inches in the last year alone, according to his latest physical.

If I were to adopt your extrapolation methods, he'll soon not only be the tallest human alive, but the tallest structure on the planet.


yeah and if my airplane keeps accelerating at the rate it does when taking off, it would reach the speed of light in a few months.

"this software doesn't work but if we 10x the resources in r&d a couple more times surely it will" is quite the argument to be seen making in public




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: