> *This is ego speaking.* No, it really isn't. Repeatedly, the case is that peop...

AlphaSite · 2024-12-16T22:37:44 1734388664

I think fundamentally if all you do is glue together popular OSS libraries in well understood way, then yes. You may be replaced. But really you probably could be replaced by a Wordpress plugin at that point.

The moment you have some weird library that 4 people in the world know (which happens more than you’d expect) or hell even something without a lot of OSS code what exactly is an LLM going to do? How is it supposed to predict code that’s not derived from its training set?

My experience thus far is that it starts hallucinating and it’s not really gotten any better at it.

I’ll continue using it to generate sed and awk commands, but I’ve yet to find a way to make my life easier with the “hard bits” I want help with.

deathanatos · 2024-12-17T01:05:38 1734397538

> I’ll continue using it to generate sed and awk commands,

The first example I gave was an example of someone using an LLM to generate sed & awk commands, on which it failed spectacularly, on everything from the basics to higher-level stuff. The emitted code even included awk, and the awk was poor quality: e.g., it had to store the git log output & make several passes over it with awk, when in reality, you could just `git log | awk`; it was doing `... | grep | awk` which … if you know awk, really isn't required. The regex it was using to work with the git log output it was parsing with awk was wrong, resulting in the wrong output. Even trivial "sane bash"-isms, it messed up: didn't quote variables that needed to be quotes, didn't take advantage of bashisms even though requiring bash in the shebang, etc.

The task was a simple one, bordering on trivial, and any way you cut it, from "was the code correct?" to "was the code high quality?", it failed.

But it shouldn't be terribly surprising that an LLM would fail at writing decent bash: its input corpus would resemble bash found on the Internet, and IME, most bash out there fails to follow best-practice; the skill level of the authors probably follows a Pareto distribution due to the time & effort required to learn anything. GIGO, but with way more steps involved.

I've other examples, such as involving Kubernetes: Kubernetes is also not in the category of "4 people in the world know": "how do I get the replica number from a pod in a statefulset?" (i.e., the -0, -1, etc., at the end of the pod name) — I was told to query,

  .metadata.labels.replicaset-序号

(It's just nonsense; not only does no such label exist for what I want, it certainly doesn't exist with a Chinese name. AFAICT, that label name did not appear on the Internet at the time the LLM generated it, although it does, of course, now.) Again, simple task, wide amount of documentation & examples in the training set, and garbage output.

valval · 2024-12-17T13:15:38 1734441338

You ask what the LLM is going to do. It’s going to swallow the entire code base in context and allow any developer to join those 4 people in generating production grade code.

paulcole · 2024-12-17T04:23:06 1734409386

> No, it really isn't

It really is. Either that or you’re not thinking about what you’re saying.

Imagine code passes your rigorous review.

How do you know that it wasn’t from an LLM?

If it’s because you know that you only let good code pass your review and you know that LLMs only generate bad code, think about that a bit.

deathanatos · 2024-12-17T04:46:28 1734410788

> Imagine code passes your rigorous review. How do you know that it wasn’t from an LLM?

That's not what I'm saying (and it's a strawman; yes, presumably some LLM code would escape review and I wouldn't know it's from an LLM, though I find that unlikely, given…) — what I'm saying is of LLM generated code that is reviewed, what is the quality & correctness of the reviewed code? And it's resoundingly (easily >90%) crap.

Obviously we can't sample from unknown-authorship … nor am I; I'm sampling problems that I and others run through an LLM, and the output thereof.

The other facet of this point is that I believe a lot of the craze that users using the LLM have is driven by them not looking closely at the output; if you're just deriving code from the LLM, chucking it over the wall, and calling it a day (as was the case from one of the examples in the comment above) — you're perceiving the LLM as being useful, when it fact it is leaving bugs that you're either not attributing to it, someone else is cleaning up (again, that was the case in the above example), etc.

paulcole · 2024-12-17T05:42:44 1734414164

> what I'm saying is of LLM generated code that is reviewed, what is the quality & correctness of the reviewed code? And it's resoundingly (easily >90%) crap.

What makes you so sure that none of the resoundingly non-crap that you have reviewed was not produced by LLM?

It’s like saying you only like homemade cookies not ones from the store. But you may be gleefully chowing down on cookies that you believe are homemade because you like them (so they must be homemade) without knowing they actually came from the store.

whtsthmttrmn · 2024-12-17T17:32:04 1734456724

> What makes you so sure that none of the resoundingly non-crap that you have reviewed was not produced by LLM?

From the post you're replying to:

> Obviously we can't sample from unknown-authorship … nor am I; I'm sampling problems that I and others run through an LLM, and the output thereof.

paulcole · 2024-12-17T22:52:51 1734475971

Yes, believe it or not I’m able to read.

> I'm sampling problems that I and others run through an LLM

This is not what’s happening unless 100% of the problems they’ve sampled (even outside of this fun little exercise) have been run through an LLM.

They’re pretending like it doesn’t matter that they’re looking at untold numbers of other problems and are not aware whether those are LLM generated or not.

whtsthmttrmn · 2024-12-17T23:01:01 1734476461

It sounds like that's what's happening. The LLM code that they have reviewed has been, to their standards, subpar.

paulcole · 2024-12-18T14:11:14 1734531074

> The LLM code that they have reviewed has been, to their standards, subpar.

No. The accurate way to say this is:

“The code that they have reviewed that they know came from an LLM has been, to their standards, subpar.”