Hold on, hold on. You're missing a step here. I agree completely that an LLM's f...

bambax · 2025-02-15T20:45:28 1739652328

The problem is what one means by "works". Is it just that it runs without triggering exceptions here and there?

One has to know, and understand, what the code is supposed to be doing, to evaluate it. Or use tests.

But LLMs love to lie so they can't be trusted to write the tests, or even to report how the code they wrote passed the tests.

In my experience the way to use LLMs for coding is exactly the opposite: the user should already have very good knowledge of the problem domain as well as the language used, and just needs to have a conversation with someone on how to approach a specific implementation detail (or help with an obscure syntax quirk). Then LLMs can be very useful.

But having them directly output code for things one doesn't know, in a language one doesn't know either, hoping they will magically solve the problem by iterating in "closed loops", will result in chaos.

tptacek · 2025-02-15T21:58:20 1739656700

It clearly does not result in chaos. This is an "I believe my lying eyes" situation, where I can just see that I can get an agent-y LLM codegen setup to generate a sane-looking working app in a language I'm not fluent in.

The thing everyone thinks about with LLM codegen is hallucination. The biggest problem for LLMs with hallucination is that there are no guardrails; it can just say whatever. But an execution environment provides a ground truth: code works or it doesn't, a handler path generates an exception or it doesn't, a lint rule either compiles and generates workable output or it doesn't.

bambax · 2025-02-16T08:52:49 1739695969

> code works or it doesn't

It seems you're deliberately confusing "works" with "runs". They're different things.

danielbln · 2025-02-15T18:15:41 1739643341

That's also the problem with these conversations. Some people evaluate zero-shot promoted code oozing out of gpt-3.5, others plug Sonnet into an IDE with access to terminal, LSP, diagnostics etc. crunching through a problem in an agentic self improvement loop. Those two approaches will generate very different quality levels of code.

vlovich123 · 2025-02-15T20:52:17 1739652737

An LLM though doesn’t truly understand the goal AND it frequently gets into circular loops it can’t get out of when the solution escapes its capability rather than asking for help. Hopefully it’ll get fixed but some of this stuff is an architectural problem rather than just iterating on the transformer idea.

tptacek · 2025-02-15T22:00:24 1739656824

That's totally true, but it's also a small amount of Python code in the agent scaffolding to ensure that it bails on those kinds of loops. Meanwhile, for something like Semgrep, the status quo ante was essentially no Semgrep rules getting written at all (I believe the modal Semgrep user just subscribes to existing rule repositories). If a closed-loop LLM setup can successfully generate Semgrep rules for bug patterns even 5% of the time, that is a material win, and a win that comes at very little cost.

mcqueenjordan · 2025-02-16T02:05:55 1739671555

Yeah, I more or less agree about the closed loop part and the overall broader point the article was making in this context — that it may be a useful use case. I think it’s likely that process creates a lot of horseshit that passes through the process, but that might still be better than nothing for semgrep rules.

I only came down hard on that quote out of context because it felt somewhat standalone and I want to broadcast this “fluency paradox” point a bit louder because I keep running into people who really need to hear it.

I know you know what’s up.