Young man, software often has more than 50 lines of code that merely merges basi...

Young man, software often has more than 50 lines of code that merely merges basic examples from two libraries. That stuff is useful too, but that's a 0.5x intern, not a 10x developer.

I've told the same Claude to write me unit tests for a very well known well-documented API. It was too dumb to deduce what edge cases it should test, so I also had to give it a detailed list of what to test and how. Despite all of that, it still wrote crappy tests that misused the API. It couldn't properly diagnose the failures, and kept adding code for non-existing problems. It was bad at applying fixes even when told exactly what to fix. I've wasted a lot of time cleaning up crappy code and diagnosing AI-made mistakes. It would have been quicker to write it all myself.

I've tried Claude and GPT4o for a task that required translating imperative code that writes structured data to disk field by field into explicit schema definitions. It was an easy, but tedious task (I've had many structs to convert). AI hallucinated a bunch of fields, and got many types wrong, wasting a lot of my time on diagnosing serialization issues. I really wanted it to work, but I've burned over $100 in API credits (not counting subscriptions) trying various editors and approaches. I've wasted time and money managing context for it, to give it enough of the codebase to stop it from hallucinating the missing parts, but also carefully trim it to avoid distracting it or causing rot. It just couldn't do the work precisely. In the end I had scrap it all, and do it by hand myself.

I've tried gpt4o and 4-mini-high to write me a specific image processing operation. They could discuss the problem with seemingly great understanding (referencing academic research, advanced data structures). I even got a Python that had correct syntax on the first try! But the implementation had a fundamental flaw that caused numeric overflows. AI couldn't fix it itself (kept inventing stupid workarounds that didn't work or even defeated the point of the whole algorithm). When told step by step what to do to fix it, it kept breaking other things in the process.

I've tried to make AI upgrade code using an older version of a dependency to a newer one. I've provided it with relevant quotes from the docs (I know it would have been newer than its knowledge cutoff), and even converted parts of the code myself, so it could just follow the pattern. The AI couldn't properly copy-paste code from one function to another. It kept reverting things. When I pointed out the issues, it kept apologising, saying what new APIs it's going to use, and then use the old APIs again!

I've also briefly tried GH copilot, but it acted like level 1 tech support, despite burning tokens of a more capable model.