Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’d be interested to know how many of these were actually correct and usable. My suspicion is not many. I find these tools good at generating boilerplate and superficially correct code, but that they often miss edge cases.

Knowing that code is correct is as important as the code itself, and this is why we do code review, write tests, have QA processes, use logging and observability tools, etc. Of course the place that catches the most bugs is the human writing the code, as they write it.

This feels like a nice extension to Copilot/etc, but I’m not sure it’s as general as people think.

Perhaps an interesting challenge to pose to it is: here’s 10k lines and a stack trace, what’s the bug. Or here’s a database schema, what issues might occur in production using this?



I've started asking it to write detailed tests for all of the functions it writes. If it doesn't have a test for {edge-case}, I ask it to rewrite the code to ensure that {edge-case} should work and it should be tested.

Once I trust the tests, I generally trust the code.


How can you trust the tests?

I've seen Copilot generate code I read and thought was correct, that went through code review and everyone thought was correct, that had tests written for it (that nearly covered everything), and that even when it failed, was hard to spot the issue.

It turned out it got a condition the wrong way around, but given the nesting of conditionals it wasn't obvious.

I don't think a human who was thinking through the problem would have made the same mistake at the point of writing, in fact I think that the mind state while actually writing the code is hard to reproduce at any later time, which is why code review isn't great at catching bugs like this.


> here’s 10k lines and a stack trace

Ah must be a Spring application ...


Why?

This seems like the lowest number that would be useful. Below that it's not really a problem to debug, but at that point there's typically enough complexity that some help would be useful as you forget edge cases and features in the codebase.

For demonstration purposes doing it with 100 lines might be ok, but for professional use it kinda needs to understand quite a lot! Like a minimum of that order of magnitude, but potentially millions of lines.

FWIW, I've never used Spring. My experience is mostly Django, iOS, non-Spring Java, and some Android.


Yup, if it's >10k lines, MUST be a Spring application. Unfortunate they didn't write it in Rust that promises 100% correct programs (within Rust-accepted definition of "Correct" and "bug-free") solving any problem but always under 10k lines, that's the Rust guarantee.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: