Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We'll need some well researched study on how much LLMs actually help vs not. I know they can be useful in some situations, but it also sometimes takes a few days away from it to realise the negative impacts. Like the "copilot pause" coined by Primogen - you know the completion is coming, so you pause when writing the trivial thing you knew how to do anyway and wait for the completion (which may or may not be correct, wasting both time and opportunity to practice on your own). Self-reported improvement will be biased by impression and facts other than the actual outcome.

It's not that I don't believe your experience specifically. I don't believe either side in this case knows the real industry-wide average improvement until someone really measures it.



Unfortunately, we still don't have great metrics for developer productivity, other than the hilari-bad lines of code metric. Jira tickets, sprints, points, t-shirt sizes; all of that is to try and bring something measurable to the table, but everyone knows it's really fuzzy.

What I do know though, is that ChatGPT can finish a leetcode problem before I've even fully parsed the question.

There are definitely ratholes to get stuck and lose time in when trying to get the LLM to give the right answer, but LLM-unassisted programming has the same problem. When using an LLM to help, there's a bunch of different contexts I don't have to load in because the LLM is handling it giving me more head space to think about the bigger problems at hand.

No matter what a study says, as soon as it comes out, it's going to get picked apart because people aren't going to believe the results, no matter what the results say.

This shit's not properly measurable like in a hard science so you're going to have to settle for subjective opinions. If you want to make it a competition, how would you rank John Carmack, Linus Torvalds, Grace Hopper, and Fabrice Bellard? How do you even try and make that comparison? How do you measure and compare something you don't have a ruler for?


> that ChatGPT can finish a leetcode problem before I've even fully parsed the question.

This is an interesting case for two reasons. One is that leetcode is for distilled elementary problems known in CS - given all CS papers or even blogs at disposal, you should be able to solve them all by pattern matching the solution. Real work is anything but that - the elementary problems have solutions in libraries, but everything in between is complicated and messy and requires handling the unexpected/underdefined cases. The second reason is that leetcode problems are fully specified in a concise description with an example and no outside parameters. Just spending the time to define your problem to that level for the LLM is likely getting you more than halfway to the solution. And that kind of detailed spec really takes time to create.


"What I do know though, is that ChatGPT can finish a leetcode problem before I've even fully parsed the question."

You have to watch out for that, that's an AI evaluation trap. Leetcode problems are in the training set.

I'm reminded of people excitedly discussing how GPT-2 "solved" the 10 pounds of feathers versus 10 pounds of lead problem... of course, it did, that's literally in the training set. GPT-2 could be easily fooled by changing any aspect of the problem to something it did not expect. Later ones less so though last I tried a few months ago while they got it right more often then wrong they could still be pretty easily tripped up.


What that is though, is an LLM-usefulness trap. Yeah, the leetcode problem is only solved by the LLM because it's in the training data, and you can trick the LLM with some logic puzzle that's also difficult for dumb humans. But that doesn't stop it from being useful and outputting code that seems to save time.


Even if it works and saves time, it may make us pay that time back when it doesn’t work. Then we to actually think for ourselves, but we’ve been dulled. Best case, we lose time on those cases. More realistically we let bugs through. Worst case, our minds, dulled by the lack of daily training, are no longer capable of solving the problem at all, and we have to train all over again until we can… possibly until we’re fired or the project is cancelled.

Most likely though, code quality will suffer. I have a friend who observes what people commit every day, and some of them (apparently plural) copy & paste answers from an LLM and commit it before checking that it even compiles. And even when it works, it’s often so convoluted there’s no way it could pass any code review. Sure if you’re not an idiot you wouldn’t do that, but some idiots use LLMs to get through interviews (it sometimes works for remote assignments or quizzes), and spotting them on the job sometimes takes some time.

LLMs for coding are definitely useful. And harmful. How much I don’t know, though I doubt right now that the pros outweigh the cons. Good news is though, as we figure out the good uses and avoid the bad ones, it should gradually shift towards "more useful than not" over time. Or at least, "less harmful than it was".


That's one possibility. The other direction is that it takes the dull parts out of the job, so I'm no longer spending cycles on dumbass shit like formatting json properly, so that my mind can stay focused on problems bigger than if there should be a comma at the end of a line or not. Best case, our minds, freed from the drugony of tabs vs spaces, are sharpened by being able to focus on the important parts of the problem rather than than dumb parts.


> some of them (apparently plural) copy & paste answers from an LLM and commit it before checking that it even compiles.

If I were using one of these things, that's what I'd do. (Preferably rewriting the commit to read Author: AcmeBot, Committer: wizzwizz4) It's important that commit history accurately reflect the development process.

Now, pushing an untested commit? No no no. (Well, maybe, but only ever for backup purposes: never in a branch I shared with others.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: