That's rather assuming of you, they're there no less than they would be for a hu...

kuhewa · 2024-10-24T01:09:31 1729732171

But one is trying to write good-enough code. The other is trying to write good-enough-looking code. The probability of pain arising from the bugs of the latter is probably greater.

HaZeust · 2024-10-24T01:20:32 1729732832

I'd actually love to see a benchmark on this - we're just speculating now.

kuhewa · 2024-10-24T01:35:02 1729733702

The work demonstrating the Frankfurtian Bullshit nature of generated prose would suggest as much, given the architecture is the same for code outputs it seems like a fair assumption until it is demonstrated otherwise.

tharant · 2024-10-25T02:28:56 1729823336

> they're there no less than they would be for a human's programming - and VERY likely no more.

This is VERY different from my own experience. The bugs introduced by the code I’ve tried to generate via LLMs (Mostly Claude, some GPT-4o and o1-preview, and lots of one-off fiddling with local models to see if they’re any better/worse than commercial products) are considerably more numerous (and often more subtle) than what my fellow engineers—juniors included—tend to introduce.

I /want/ these tools to be useful; they haven’t been so far though and I’m kinda stuck on understanding if I’m just not using ‘em right or if they’re even capable of what I want to do. Like I said in a previous comment; I don’t know if I’m being gaslit or if I’m being naive but it feels a lot more like gaslighting.