Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> We never can have total trust in LLM output, but we can certainly sanitize it and limit it's destructive range

Can we really do this reliably? LLMs are non-deterministic, right, so how do we validate the output in a deterministic way?

We can validate things like shape of data being returned, but how do we validate correctness without an independent human in the loop to verify?



Put four juniors in separate rooms and give them the same task. Do you expect them to produce identical solutions?

If no? Then congrats, you are now in a position where your software development lifecycle needs to handle non-determinism.

This fanatical vibing movement is ridicolous, but this luddite stance that LLMs cannot contribute to software dev because they are «non deterministic» is almost as ludicrus.


> Then congrats, you are now in a position where your software development lifecycle needs to handle non-determinism.

Sure, except the Juniors producing wrong solutions is the Juniors problem, not mine

If I give four LLM agents tasks and they call come back with slightly wrong solutions, that's me adding four problems to my own workload

I'm not sure how I'm supposed to keep up with that. I'm definitely not sure it makes me overall more productive


The same way we did it with humans in the loop?

I check AI output for hallucinations and issues as I don’t fully trust it to work, but we also do PRs with humans to have another set of eyes check because humans also make mistakes.

For the soft sciences and arts I’m not sure how to validate anything from AI but for software and hard sciences I don’t see why test suites wouldn’t continue serving their same purpose


Famously, "it's easier to write code than to read it". That goes for humans. So why did we automate the easy part and move the effort over to the hard part?

If we need a human in the loop to check every row of code for the deep logic errors... then we could just get the human to write it no?


We’ve been automating the easy parts since the first compiler, but llms make everything weird.


Respectfully, I disagree. An llm in my mind is a new compiler. Just it takes natural language and produces code.


It feels like we're talking about different technologies sometimes.

I find its a slightly improved google for vague questions. Or a doxygen writer.

Its all use I've found for any ai model since i first started playing with github copilot beta.

Ive been trying the newer models as they arrived, and found they're getting more verbose, more prone to hallucinating functions that dont exist, and more prone to praise me as a god when trying to ask about basic assumptions. (you're cutting to the heart of the matter)

What kind of code do you write where its somehow replacing coding itself? I spent 30 minutes trying to get mistral to write a basic bash script yesterday.


I am playing with open weights models at home and yeah they are like that ... I use Claude 3.7 @ work and yeah it is a lot better ... Sometimes it will flub things but it also can write large amounts of code ... Mostly how I want (the pareto principle comes into play for the parts I don't want though).

So for me, the future will tend towards this ... Currently the tech is early days, we have no way to steer thought.. We have no way to align it to our thought processes... But eventually we will get to I want x pls make and it will be able to do it well.


I want to point out that LLMs can be completely deterministic if the final sampler is run with 0 temperature (picking the highest probability token), no top-k, fixed seed, etc.


Highest probability token can still vary nondeterministically when the computation is essentially racing GPU cores or even separate hosts against each other. Float math evaluation order can change the end result.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: