Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> this erudite fool is at our disposal and answers all the questions asked of them,

Yes, but I have to double-check every answer. And that, for me, greatly mitigates or entirely negates their utility. Of what value is a pocket calculator that only gets the right answer 75% if the time, and you don't ex ante know what 75%?



- I can read the code and reading code is faster than writing it.

- I can also tell the llm to write tests for the code it wrote and i can validate that the tests are valid.

- LLMs are also valuable in introducing me to concepts and techniques I would never had had exposure to. For example, I have a problem and explain my problem, it will bring up technologies or terms I never considered because I just didn't know about them. I can then do research into those technologies to decide if they are actually the right approach.


> I can also tell the llm to write tests for the code it wrote and i can validate that the tests are valid.

If I don't trust the generated code, why should I trust the generated code that tests the generated code?


Do you trust your ability to read?


As long as P != NP, verification should be much easier than producing a solution.

Or, from a different angle - all models are wrong, some are useful.

As it happens, LLMs are useful even if they're sometimes wrong.


> As long as P != NP, verification should be much easier than producing a solution.

Perhaps so. I guess it depends on how long it takes to code up property-based tests.

https://hypothesis.readthedocs.io/en/latest/


Programming is special because 99% of times you can tell immediately if something works or not, so the risk of misinformation is very narrow.


Perhaps you've omitted some important context here, or you're using an extremely restricted definition of "works"? The interesting and hard question with software is not "did it compile" but rather "did it meet the never-clearly-articulated needs of the user"...

I would agree that it is a primary goal of software engineering to move as much as possible into the category of automatic verification, but we're a long, long way from 99%.


I agree with your point here.

I think that antirez is technically correct in that there is a vast amount of code that will not compile compared to the amount of code that will compile. So saying '99%' sort of makes sense.

But that doesn't capture the fact that of the code that compiles there is a vast amount of code that doesn't do what we want to happen at runtime compared to the code that does do what we want to happen.

And after that there is a vast amount of code that doesn't do what we want to happen 100% of the time at runtime compared to the code that only most of the times does what we want to happen at runtime.

The interesting thought experiment that came to me when thinking about this was that I would be more likely to trust LLM code in C# or Rust than I would be to trust LLM code in assembly or Ruby.

Which makes me wonder ... can LLMs write working Idris or ATS code?


I've seen people put untested AI hallucinations under review, with non existant function names, passing CI just because it was under debug defines.

I've seen some refer to non existant APIs while discussing migration to a new library major version. "Sure that's easy, we should just replace this function with this new one".

Imagine all those more subtle bugs that are harder to spot.


Hi! I'm a big fan of Redis and also the little Kilo editor you wrote.

But, I have to disagree on this point, since many programs written in ie. C have security issues that takes a long time to discover.


Doesn't seem very secure if it entirely dependent on human's cheking it manually. Humans are famously fallible.


I am glad you agree with my point that 99% of the time you can not immediately tell if code works or not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: