I don't understand the premise. If I trust someone to write good code, I learned...

acedTrex · 2025-06-26T12:57:46 1750942666

Author here:

Essentially the premise is that in medium trust environments like very large teams or low trust environments like an open source project.

LLMs make it very difficult to make an immediate snap judgement about the quality of the dev that submitted the patch based solely on the code itself.

In the absence of being able to ascertain the type of person you are dealing with you have to fall back too "no trust" and review everything with a very fine tooth comb. Essentially there are no longer any safe "review shortcuts" and that can be painful in places that relied on those markers to grease the wheels so to speak.

Obviously if you are in an existing competent high trust team then this problem does not apply and most likely seems completely foreign as a concept.

lxgr · 2025-06-26T13:21:40 1750944100

> LLMs make it very difficult to make an immediate snap judgement about the quality [...]

That's the core of the issue. It's time to say goodbye to heuristics like "the blog post is written in eloquent, grammatical English, hence the point its author is trying to make must be true" or "the code is idiomatic and following all code styles, hence it must be modeling the world with high fidelity".

Maybe that's not the worst thing in the world. I feel like it often made people complacent.

acedTrex · 2025-06-26T13:25:55 1750944355

> Maybe that's not the worst thing in the world. I feel like it often made people complacent.

For sure, in some ways perhaps reverting to a low trust environment might improve quality in that it now forces harsher/more in depth reviews.

That however doesn't make the requirement less exhausting for people previously relying heavily on those markers to speed things up.

Will be very interesting to see how the industry standardizes around this. Right now it's a bit of the wild west. Maybe people in ten years will look back at this post and think "what do you mean you judged people based on the code itself that's ridiculous"

furyofantares · 2025-06-26T14:07:42 1750946862

I think you're unfair to the heuristics people use in your framing here.

You said "hence the point its author is trying to make must be true" and "hence it must be modeling the world with high fidelity".

But it's more like "hence the author is likely competent and likely put in a reasonable effort."

When those assumptions hold, putting in a very deep review is less likely to pay off. Maybe you are right that people have been too complacent to begin with, I don't know, but I don't think you've framed it fairly.

lxgr · 2025-06-26T15:30:11 1750951811

> But it's more like "hence the author is likely competent and likely put in a reasonable effort."

And isn't dyslexic, and is a native speaker etc. Some will gain from this shift, some will lose.

furyofantares · 2025-06-26T16:40:41 1750956041

Yes! This is part of why I bristle at such reductive takes, we can use more nuance thinking about what we are gaining and what we are losing and how to deal with it.

o11c · 2025-06-26T15:57:52 1750953472

That's not how heuristics work.

The heuristic is "this submission doesn't even follow the basic laws of grammar, therefore I can safely assume incompetence and ignore it entirely."

You still have to do verification for what passes the heuristic, but it keeps 90% of the crap away.

tempodox · 2025-06-26T15:07:36 1750950456

Anyway, “following all code styles” is just a fancy way of saying “adheres to fashion”. What meaningful conclusions can you draw from that?

rurp · 2025-06-26T15:21:19 1750951279

It's not about fashion, it's about diligence and consideration. Code formatting is totally different from say clothing fashion. Social fashions are often about being novel or surprising which is the opposite of how good code is written. Code should be as standard, clear and unsurprising as is reasonably possible. If someone is writing code in a way that's deliberately unconventional or overly fancy that's a strong signal that it isn't very good.

When someone follows standard conventions it means that they A) have a baseline level of knowledge to know about them, and B) care to write the code in a clear and approachable way for others.

tempodox · 2025-06-26T15:34:48 1750952088

> If someone is writing code in a way that's deliberately unconventional or overly fancy that's a strong signal that it isn't very good.

“unconventional” or “fancy” is in the eye of the beholder. Whose conventions are we talking about? Code is bad when it doesn't look the way you want it to? How convenient. I may find code hard to read because it's formatted “conventionally”, but I wouldn't be so entitled as to call it bad just because of that.

kiitos · 2025-06-26T15:52:08 1750953128

> “unconventional” or “fancy” is in the eye of the beholder.

Literally not: a language defines its own conventions, they're not defined in terms of individual users/readers/maintainers subjective opinions.

> Whose conventions are we talking about?

The conventions defined by the language.

> Code is bad when it doesn't look the way you want it to?

No -- when it doesn't satisfy the conventions established by the language.

> I may find code hard to read because it's formatted “conventionally”,

If you did this then you'd be wrong, and that'd be a problem with your personal evaluation process/criteria, that you would need to fix.

Capricorn2481 · 2025-06-26T18:26:08 1750962368

> a language defines its own conventions

Where are these mythical languages? I think the word you're looking for is syntax, which is entirely different. Conventions are how code is structured and expected to be read. Very few languages actually enforce or even suggest conventions, hence the many style guides. It's a standout feature of Go to have a format style, and people still don't agree with it.

And it's kinda moot when you can always override conventions. It's more accurate to say a team decides on the conventions of a language.

habinero · 2025-06-26T17:50:17 1750960217

No, they're absolutely correct that it's critical in professional and open source environments. Code is written once but read hundreds or thousands of times.

If every rando hire goes in and has a completely different style and formatting -- and then other people come in and rewrite parts in their own style -- code rapidly goes to shit.

It doesn't matter what the style is, as long as there is one and it's enforced.

Capricorn2481 · 2025-06-26T18:38:35 1750963115

> No, they're absolutely correct that it's critical in professional and open source environments. Code is written once but read hundreds or thousands of times

What you're saying is reasonable, but that's not what they said at all. They said there's one way to write cleanly and that's "Standard conventions", whatever that means. Yes, conventions so standard that I've read 10 conflicting books on what they are.

There is no agreed upon definition of "readable code". A team can have a style guide, which is great to follow, but that is just formalizing the personal preference of the people working on a project. It's not anymore divine than the opinion of a "rando."

habinero · 2025-06-26T21:53:54 1750974834

No, you misunderstood what they said. And I misspoke a little, too.

While it's true that in principle it doesn't matter what style you choose as long as there is one, in practice languages are just communities of people, and every community develops norms and standards. More recent languages often just pick a style and bake it in.

This is a good thing, because again, code is read 1000x more times than it's written. It saves everyone time and effort to just develop a typical style.

And yeah, the code might run no matter how you indent it, but it's not correct, any more than you going to a restaurant and licking the plates.

Capricorn2481 · 2025-06-28T15:55:41 1751126141

> More recent languages often just pick a style and bake it in.

Again, there's a couple examples of languages doing this, and everything else is a free for all.

> No, you misunderstood what they said.

Agree to disagree. Nothing in that comment talks about the conventions of a language, only the conventions of code. Again, I don't disagree with what you say, but the person you replied to was in a completely different argument.

eddd-ddde · 2025-06-27T16:03:37 1751040217

> In the absence of being able to ascertain the type of person you are dealing with you have to fall back too "no trust" and review everything with a very fine tooth comb.

Is that not how you review all code? I don't care who wrote the code, just because certain person wrote the code doesn't give them an instant pass to skip my review process.

sim7c00 · 2025-06-26T13:24:12 1750944252

its about the quality of the code, not the quality of the dev. you might think it's related, but it's not.

a dev can write piece of good, and piece of bad code. so per code, review the code. not the dev!

haswell · 2025-06-26T13:47:56 1750945676

> its about the quality of the code, not the quality of the dev. you might think it's related, but it's not.

I could not disagree more. The quality of the dev will always matter, and has as much to do with what code makes it into a project as the LLM that generated it.

An experienced dev will have more finely tuned evaluation skills and will accept code from an LLM accordingly.

An inexperienced or “low quality” dev may not even know what the ideal/correct solution looks like, and may be submitting code that they do not fully understand. This is especially tricky because they may still end up submitting high quality code, but not because they were capable of evaluating it as such.

You could make the argument that it shouldn’t matter who submits the code if the code is evaluated purely on its quality/correctness, but I’ve never worked in a team that doesn’t account for who the person is behind the code. If its the grizzled veteran known for rarely making mistakes, the review might look a bit different from a review for the intern’s code.

NeutralCrane · 2025-06-26T14:11:52 1750947112

> An experienced dev will have more finely tuned evaluation skills and will accept code from an LLM accordingly. An inexperienced or “low quality” dev may not even know what the ideal/correct solution looks like, and may be submitting code that they do not fully understand. This is especially tricky because they may still end up submitting high quality code, but not because they were capable of evaluating it as such.

That may be true, but the proxy for assessing the quality of the dev is the code. No one is standing over you as you code your contribution to ensure you are making the correct, pragmatic decisions. They are assessing the code you produce to determine the quality of your decisions, and over time, your reputation as a dev is made up of the assessments of the code you produced.

The point is that an LLM in no way changes this. If a dev uses an LLM in a non-pragmatic way that produces bad code, it will erode trust in them. The LLM is a tool, but trust still factors in to how the dev uses the tool.

haswell · 2025-06-26T15:23:10 1750951390

> That may be true, but the proxy for assessing the quality of the dev is the code.

Yes, the quality of the dev is a measure of the quality of the code they produce, but once a certain baseline has been established, the quality of the dev is now known independent of the code they may yet produce. i.e. if you were to make a prediction about the quality of code produced by a "high quality" dev vs. a "low quality" dev, you'd likely find that the high quality dev tends to produce high quality code more often.

So now you have a certain degree of knowledge even before you've seen the code. In practice, this becomes a factor on every dev team I've worked around.

Adding an LLM to the mix changes that assessment fundamentally.

> The point is that an LLM in no way changes this.

I think the LLM by definition changes this in numerous ways that can't be avoided. i.e. the code that was previously a proxy for "dev quality" could now fall into multiple categories:

1. Good code written by the dev (a good indicator of dev quality if they're consistently good over time)

2. Good code written by the LLM and accepted by the dev because they are experienced and recognize the code to be good

3. Good code written by the LLM and accepted by the dev because it works, but not necessarily because the dev knew it was good (no longer a good indicator of dev quality)

4. Bad code written by the LLM

5. Bad code written by the dev

#2 and #3 is where things get messy. Good code may now come into existence without it being an indicator of dev quality. It is now necessary to assess whether or not the LLM code was accepted because the dev recognized it was good code, or because the dev got things to work and essentially got lucky.

It may be true that you're still evaluating the code at the end of the day, but what you learn from that evaluation has changed. You can no longer evaluate the quality of a dev by the quality of the code they commit unless you have other ways to independently assess them beyond the code itself.

If you continued to assess dev quality without taking this into consideration, it seems likely that those assessments would become less accurate over time as more "low quality" devs produce high quality code - not because of their own skills, but because of the ongoing improvements to LLMs. That high quality code is no longer a trustworthy indicator of dev quality.

> If a dev uses an LLM in a non-pragmatic way that produces bad code, it will erode trust in them. The LLM is a tool, but trust still factors in to how the dev uses the tool.

Yes, of course. But the issue is not that a good dev might erode trust by using the LLM poorly. The issue is that inexperienced devs will make it increasingly difficult to use the same heuristics to assess dev quality across the board.

acedTrex · 2025-06-26T13:32:47 1750944767

> you might think it's related, but it's not.

In my experience they very much are related. High quality devs are far more likely to output high quality working code. They test, they validate, they think, ultimately they care.

In that case that you are reviewing a patch from someone you have limited experience with, it previously was feasible to infer the quality of the dev from the context of the patch itself and the surrounding context by which it was submitted.

LLMs make that judgement far far more difficult and when you can not make a snap judgement you have to revert your review style to very low trust in depth review.

No more greasing the wheels to expedite a process.

alganet · 2025-06-26T10:05:12 1750932312

> I learned to trust them because their code works well

There's so much more than "works well". There are many cues that exist close to code, but are not code:

I trust more if the contributor explains their change well.

I trust more if the contributor did great things in the past.

I trust more if the contributor manages granularity well (reasonable commits, not huge changes).

I trust more if the contributor picks the right problems to work on (fixing bugs before adding new features, etc).

I trust more if the contributor proves being able to maintain existing code, not just add on top of it.

I trust more if the contributor makes regular contributions.

And so on...

acedTrex · 2025-06-26T13:21:53 1750944113

Author here:

Spot on, there are so many little things that we as humans use as subtle verification steps to decide how much scrutiny various things require. LLMs are not necessarily the death of that concept but they do make it far far harder.

moffkalast · 2025-06-26T09:52:00 1750931520

It's easy to get overconfident and not test the LLM's code enough when it worked fine for a handful of times in a row, and then you miss something.

The problem is often really one of miscommunication, the task may be clear to the person working on it, but with frequent context resets it's hard to make sure the LLM also knows what the whole picture is and they tend to make dumb assumptions when there's ambiguity.

The thing that 4o does with deep research where it asks for additional info before it does anything should be standard for any code generation too tbh, it would prevent a mountain of issues.

stavros · 2025-06-26T10:43:57 1750934637

Sure, but you're still responsible for the quality of the code you commit, LLM or no.

moffkalast · 2025-06-26T11:31:21 1750937481

Of course you are, but it's sort of like how people are responsible their Tesla driving on autopilot, which then suddenly swerves into a wall and disengages two seconds before impact. The process forces you to make mistakes you wouldn't normally ever do or even consider a possibility.

JohnKemeny · 2025-06-26T14:10:32 1750947032

To add to devs and Teslas, you have journalists using LLMs writing summaries, lawyers using LLMs writing dispositions, doctors using LLMs writing their patient entries, and law enforcement using LLMs writing their forensics report.

All of these make mistakes (there are documented incidents).

And yes, we can counter with "the journalists are dumb for not verifying", "the lawyers are dumb for not checking", etc., but we should also be open for the fact that these are intelligent and professional people who make mistakes because they were mislead by those who sell LLMs.

bluefirebrand · 2025-06-26T17:16:48 1750958208

I think it's analogous to physical labour

In the past someone might have been physically healthy and strong enough to physically shovel dirt all day long

Nowadays this is rarer because we use an excavator instead. Yes, a professional dirt mover is more productive with an excavator than a shovel, but is likely not as physically fit as someone spending their days moving dirt with a shovel

I think it will be similar with AI. It is absolutely going to offload a lot of people's thinking into the LLMs and their "do it by hand" muscles will atrophy. For knowledge workers, that's our brain

I know this was a similar concern with search engines and Stack Overflow, so I am trying to temper my concern here as best I can. But I can't shake the feeling that LLMs provide a way for people to offload their thinking and go on autopilot a lot more easily than Search ever did

I'm not saying that we were better off when we had to move dirt by hand either. I'm just saying there was a physical tradeoff when people moved out of the fields and into offices. I suspect there will be a cognitive tradeoff now that we are moving away from researching solutions to problems and towards asking the AI to give us solutions to problems

acedTrex · 2025-06-26T13:46:09 1750945569

In an ideal world you would think everyone see's it this way. But we are starting to see an uptick in "I don't know the LLMs said do that."

As if that is a somehow exonerating sentence.

NeutralCrane · 2025-06-26T14:17:25 1750947445

It isn’t, and that is a sign of a bad dev you shouldn’t trust.

LLMs are a tool, just like any number of tools that are used by developers in modern software development. If a dev doesn’t use the tool properly, don’t trust them. If they do, trust them. The way to assess if they use it properly is in the code they produce.

Your premise is just fundamentally flawed. Before LLMs, the proof of a quality dev was in the pudding. After LLMs, the proof of a quality dev remains in the pudding.

acedTrex · 2025-06-26T14:22:00 1750947720

> Your premise is just fundamentally flawed. Before LLMs, the proof of a quality dev was in the pudding. After LLMs, the proof of a quality dev remains in the pudding.

Indeed it does, however what the "proof" is has changed. In terms of sitting down and doing a full, deep review, tracing every path validating every line etc. Then for sure, nothing has changed.

However, at least in my experience, pre LLM those reviews were not EVERY CASE there were many times I elided parts of a deep review because i saw markers in the code that to me showed competency, care etc. With those markers there are certain failure conditions that can be deemed very unlikely to exist and therefore the checks can be skipped. Is that ALWAYS the correct assumption? Absolutely not but the more experienced you are the less false positives you get.

LLMs make those markers MUCH harder to spot, so you have to fall back to doing a FULL indepth review no matter what. You have to eat ALL the pudding so to speak.

For people that relied on maybe tasting a bit of the pudding then assuming based on the taste the rest of the pudding probably tastes the same its rather jarring and exhausting to now have to eat all of it all the time.

NeutralCrane · 2025-06-26T14:36:21 1750948581

> However, at least in my experience, pre LLM those reviews were not EVERY CASE there were many times I elided parts of a deep review because i saw markers in the code that to me showed competency, care etc.

That was never proof in the first place.

If anything, someone basing their trust in a submission on anything other than the code itself is far more concerning and trust-damaging to me than if the submitter has used an LLM.

acedTrex · 2025-06-26T14:39:13 1750948753

> That was never proof in the first place.

I mean, it's not necessarily HARD proof but it has been a reliable enough way to figure out which corners to cut. You can of course say that no corners should ever be cut and while that is true in an ideal sense. In the real world things always get fuzzy.

Maybe the death of cutting corners is a good thing overall for output quality. Its certainly exhausting on the people tasked with doing the reviews however.

breuleux · 2025-06-26T15:14:44 1750950884

I don't know about that. Cutting corners will never die.

Ultimately I don't think the heuristics would change all that much, though. If every time you review a person's PR, almost everything is great, they are either not using AI or they are vetting what the AI writes themselves, so you can trust them as you did before. It may just take some more PRs until that's apparent. Those who submit unvetted slop will have to fix a lot of things, and you can crank up the heat on them until they do better, if they can. (The "if they can" is what I'm most worried about.)

insane_dreamer · 2025-06-26T15:34:10 1750952050

> If someone uses an LLM and produces bug-free code, I'll trust them.

Only because you already trust them to know that the code is indeed bug-free. Some cases are simple and straightforward -- this routine returns a desired value or it doesn't. Other situations are much more complex in anticipating the ways in which it might interact with other parts of the system, edge cases that are not obvious, etc. Writing code that is "bug free" in that situation requires the writer of the code to understand the implications of the code, and if the dev doesn't understand exactly what the code does because it was written by an LLM, then they won't be able to understand the implications of the code. It then falls to the reviewer to understand the implications of the code -- increasing their workload. That was the premise.

somewhereoutth · 2025-06-26T10:42:04 1750934524

Because when people use LLMs, they are getting the tool to do the work for them, not using the tool to do the work. LLMs are not calculators, nor are they the internet.

A good rule of thumb is to simply reject any work that has had involvement of an LLM, and ignore any communication written by an LLM (even for EFL speakers, I'd much rather have your "bad" English than whatever ChatGPT says for you).

I suspect that as the serious problems with LLMs become ever more apparent, this will become standard policy across the board. Certainly I hope so.

stavros · 2025-06-26T10:44:46 1750934686

Well, no, a good rule of thumb is to expect people to write good code, no matter how they do it. Why would you mandate what tool they can use to do it?

somewhereoutth · 2025-06-26T11:19:26 1750936766

Because it pertains to the quality of the output - I can't validate every line of code, or test every edge case. So if I need a certain level of quality, I have to verify the process of producing it.

This is standard for any activity where accuracy / safety is paramount - you validate the process. Hence things like maintenance logs for airplanes.

acedTrex · 2025-06-26T13:02:05 1750942925

> So if I need a certain level of quality, I have to verify the process of producing it

Precisely this, and this is hardly a unique to software requirement. Process audits are everywhere in engineering. Previously you could infer the process of producing some code by simply reading the patch and that generally would tell you quite a bit about the author itself. Using advanced and niche concepts with imply a solid process with experience backing it. Which would then imply that certain contextual bugs are unlikely so you skip looking for them.

My premise in the blog is basically that "Well now I have go do a full review no matter what the code itself tells me about the author."

badsectoracula · 2025-06-26T20:19:24 1750969164

> My premise in the blog is basically that "Well now I have go do a full review no matter what the code itself tells me about the author."

Which IMO is the correct approach - or alternatively, if you do actually trust the author, you shouldn't care if they used LLMs or not because you'd trust them to check the LLM output too.

badsectoracula · 2025-06-26T20:21:16 1750969276

The false assumption here is that humans will always write better code than LLMs, which is certainly not the case for all humans nor all LLMs.

tranchebald · 2025-06-26T16:03:41 1750953821

I’m not seeing a lot of discussion about verification or a stronger quality control process anywhere in the comments here. Is that some kind of unsolvable problem for software? I think if the standard of practice is to use author reputation as a substitute for a robust quality control process, then I wouldn’t be confident that the current practice is much better than AI code-babel.

badsectoracula · 2025-06-26T20:04:37 1750968277

> Because when people use LLMs, they are getting the tool to do the work for them, not using the tool to do the work.

You can say that for pretty much any sort of automation or anything that makes things easier for humans. I'm pretty sure people were saying that about doing math by hand around when calculators became mainstream too.

breuleux · 2025-06-26T15:28:40 1750951720

I think the main issue is people using LLMs to do things that they don't know how to do themselves. There's actually a similar problem with calculators, it's just a much smaller one: if you never learn how to add or multiply numbers by hand and use calculators for everything all the time, you may sometimes make absurd mistakes like tapping 44 * 3 instead of 44 * 37 and not bat an eye when your calculator tells you the result is a whole order of magnitude less than what you should have expected. Because you don't really understand how it works. You haven't developed the intuition.

There's nothing wrong with using LLMs to save time doing trivial stuff you know how to do yourself and can check very easily. The problem is that (very lazy) people are using them to do stuff they are themselves not competent at. They can't check, they won't learn, and the LLM is essentially their skill ceiling. This is very bad: what plus-value are you supposed to bring over something you don't understand? AGI won't have to improve from the current baseline to surpass humans if we're just going to drag ourselves down to its level.

mexicocitinluez · 2025-06-26T12:51:58 1750942318

>Because when people use LLMs, they are getting the tool to do the work for them, not using the tool to do the work.

What? How on god's green earth could you even pretend to know how all people are using these tools?

> LLMs are not calculators, nor are they the internet.

Umm, okay? How does that make them less useful?

I'm going to give you a concrete example of something I just did and let you try and do whatever mental gymnastics you have to do to tell me it wasn't useful:

Medicare requires all new patients receiving home health treatment go through a 100+ question long form. This form changes yearly, and it's my job to implement the form into our existing EMR. Well, part of that is creating a printable version. Guess what I did? I uploaded the entire pdf to Claude and asked it to create a print-friendly template using Cottle as the templating language in C#. It generated the 30 page print preview in a minute. And it took me about 10 more minutes to clean up.

> I suspect that as the serious problems with LLMs become ever more apparent, this will become standard policy across the board. Certainly I hope so.

The irony is that they're getting better by the day. That's not to say people don't use them for the wrong applications, but the idea that this tech is going to be banned is absurd.

> A good rule of thumb is to simply reject any work that has had involvement of an LLM

Do you have any idea how ridiculous this sounds to people who actually use the tools? Are you going to be able to hunt down the single React component in which I asked it to convert the MUI styles to tailwind? How could you possibly know? You can't.

sebmellen · 2025-06-26T13:11:22 1750943482

You’re being unfairly downvoted. There is a plague of well-groomed incoherency in half of the business emails I receive today. You can often tell that the author, without wrestling with the text to figure out what they want to say, is a kind of stochastic parrot.

This is okay for platitudes, but for emails that really matter, having this messy watercolor kind of writing totally destroys the clarity of the text and confuses everyone.

To your point, I’ve asked everyone on my team to refrain from writing words (not code) with ChatGPT or other tools, because the LLM invariably leads to more complicated output than the author just badly, but authentically, trying to express themselves in the text.

jimbokun · 2025-06-26T15:02:30 1750950150

I find the idea of using LLMs for emails confusing.

Surely it's less work to put the words you want to say into an email, rather than craft a prompt to get the LLM to say what you want to say, and iterate until the LLM actually says it?

fwip · 2025-06-26T15:56:18 1750953378

My own opinion, which is admittedly too harsh, is that they don't really know what they want to say. That is, the prompt they write is very short, along the lines of `ask when this will be done` or `schedule a followup`, and give the LLM output a cursory review before copy-pasting it.

jimbokun · 2025-06-26T19:50:51 1750967451

Still funny to me.

`ask when this will be done` -> ChatGPT -> paste answer into email

vs

type: "when will this be done?" Send.

sebmellen · 2025-06-27T14:44:19 1751035459

Greetings Jimbokun,

I trust this message finds you well.

I am writing to inquire about the projected completion timeline for the HackerNews initiative. In order to optimize our downstream workflows and ensure all dependencies are properly aligned, an estimated delivery date would be highly valuable.

Could you please provide an updated forecast on when we might anticipate the project's conclusion? This data will assist in calibrating our subsequent operational parameters.

Thank you for your continued focus and effort on this task. Please advise if any additional resources or support from my end could help expedite the process.

Best regards, Sebmellen

acedTrex · 2025-06-26T13:18:48 1750943928

Yep, I have come to really dislike LLMs for documentation as it just reads wrong to me and I find so often misses the point entirely. There is so much nuance tied up in documentation and much of it is in what is NOT said as much as what is said.

The LLMs struggle with both but REALLY struggle with figuring out what NOT to say.

unsignedint · 2025-06-27T20:49:36 1751057376

I definitely see where you're coming from, though I have a slightly different perspective.

I agree that LLMs often fall short when it comes to capturing the nuanced reasoning behind implementations—and when used in an autopilot fashion, things can easily go off the rails. Documentation isn't just about what is said, but also what’s not said, and that kind of judgment is something LLMs do struggle with.

That said, when there's sufficient context and structure, I think LLMs can still provide a solid starting point. It’s not about replacing careful documentation but about lowering the barrier to getting something down—especially in environments where documentation tends to be neglected.

In my experience, that neglect can stem from a few things: personal preference, time pressure, or more commonly, language barriers. For non-native speakers, even when they fully understand the material, writing clear and fluent documentation can be a daunting and time-consuming task. That alone can push it to the back burner. Add in the fact that docs need to evolve alongside the code, and it becomes a compounding issue.

So yes, if someone treats LLM output as the final product and walks away, that’s a real problem. And honestly, this ties into my broader skepticism around the “vibe coding” trend—it often feels more like “fire and forget” than responsible tool usage.

But when approached thoughtfully, even a 60–90% draft from an LLM can be incredibly useful—especially in situations where the alternative is having no documentation at all. It’s not perfect, but it can help teams get unstuck and move forward with something workable.

short_sells_poo · 2025-06-26T14:29:21 1750948161

I wonder if this is to a large degree also because when we communicate with humans, we take cues from more than just the text. The personality of the author will project into the text they write, and assuming you know this person at least a little bit, these nuances will give you extra information.

flir · 2025-06-26T12:58:28 1750942708

> A good rule of thumb is to simply reject any work that has had involvement of an LLM,

How are you going to know?

bluefirebrand · 2025-06-26T17:19:12 1750958352

That's sort of the problem isn't it? There is no real way to know so we sort of just have to assume every bit of work is involving LLMs now so we have to take a lot closer look at everything

jnxx · 2025-06-27T16:11:14 1751040674

We already treat spam and unsolicited commercial email that way.

taneq · 2025-06-26T09:55:42 1750931742

If you have a long standing, effective heuristic that “people with excellent, professional writing are more accurate and reliable than people with sloppy spelling and punctuation” then the appearance of a semi-infinite group of ‘people’ writing well presented, convincingly worded articles which nonetheless are riddled with misinformation, hidden logical flaws, and inconsistencies, you’re gonna end up trusting everyone a lot less.

It’s like if someone started bricking up tunnel entrances and painting ultra realistic versions of the classic Road Runner tunnel painting on them, all over the place. You’d have to stop and poke every underpass with a stick just to be sure.

stavros · 2025-06-26T10:02:21 1750932141

Sure, your heuristic no longer works, and that's a bit inconvenient. We'll just find new ones.

sebmellen · 2025-06-26T13:04:57 1750943097

Yeah, now you need to be able to demonstrate verbal fluency. The problem is, that inherently means a loss of “trusted anonymous” communication, which is particularly damaging to the fiber of the internet.

acedTrex · 2025-06-26T13:15:30 1750943730

Author here:

Precisely, in the age where it is very difficult to ascertain the type or quality of skills you are interacting with say in a patch review or otherwise you frankly have to "judge" someone and fallback to suspicion and full verification.

taneq · 2025-06-26T18:50:26 1750963826

Yeah I think "trust for a fluent, seemingly logically coherent anonymous responder" pretty much captures it.

oasisaimlessly · 2025-06-26T13:58:06 1750946286

"A bit inconvenient" might be the understatement of the year. If information requires say, 2x the time to validate, the utility of the internet is halved.

jnxx · 2025-06-27T16:19:07 1751041147

Too bad that the language we use is also a demonstration of social status. If I think about it, it could have a somewhat corrosive effect on that glue that keeps society in shape.

legacynl · 2025-06-27T15:55:48 1751039748

> How is this different from when they were only using their brain to produce the code?

If A = B, higher B means that A is higher too.

Now it's A + Ai = B. Now Higher B doesn't necessarily mean higher A.

Especially since the current state of Ai is pretty much stochastic, and sometimes is worse than nothing at all

JambalayaJimbo · 2025-06-28T02:46:05 1751078765

I have never been in a work environment in which I’ve been able to do more than rubber stamp PRs. Performing a deep review of each change is simply impossible with the expectations we were given.

stavros · 2025-06-28T08:09:37 1751098177

Interesting, I've never been in one where we didn't read PRs.

mexicocitinluez · 2025-06-26T12:45:01 1750941901

It's not.

What you're seeing now is people who once thought and proclaimed these tools as useless now have to start to walk back their claims with stuff like this.

It does amaze me that the people who don't use these tools seem to have the most to say about them.

acedTrex · 2025-06-26T13:38:48 1750945128

Author here:

For what it's worth I do actually use the tools albeit incredibly intentionally and sparingly.

I see quite a few workflows and tasks that they can be a value add on, mostly outside of the hotpath of actual code generation but still quite enticing. So much so in fact I'm working on my own local agentic tool with some self hosted ollama models. I like to think that i am at least somewhat in the know on the capabilities and failure points of the latest LLM tooling.

That however doesn't change my thoughts on trying to ascertain if code submitted to me deserves a full indepth review or if I can maybe cut a few corners here and there.

mexicocitinluez · 2025-06-26T14:17:12 1750947432

> That however doesn't change my thoughts on trying to ascertain if code submitted to me deserves a full indepth review or if I can maybe cut a few corners here and there.

How would you even know? Seriously, if I use Chatgpt to generate a one-off function for a feature I'm working on that searches all classes for one that inherits a specific interface and attribute, are you saying you'd be able to spot the difference?

And what does it even matter it works?

What if I use Bolt to generate a quick screen for a PoC? Or use Claude to create a print-preview with CSS of a 30 page Medicare form? Or converting a component's styles MUI to tailwind? What if all these things are correct?

This whole OS repos will ban LLM-generated code is a bit absurd.

> or what it's worth I do actually use the tools albeit incredibly intentionally and sparingly.

How sparingly? Enough to see how it's constantly improving?

acedTrex · 2025-06-26T14:29:50 1750948190

> How would you even know? Seriously, if I use Chatgpt to generate a one-off function for a feature I'm working on that searches all classes for one that inherits a specific interface and attribute, are you saying you'd be able to spot the difference?

I don't know, thats the problem. As a result, because I can't know I have to now do full in depth reviews no matter what. Which is the "judging" I tongue in cheek talk about in the blog.

> How sparingly? Enough to see how it's constantly improving?

Nearly daily, to be honest I have not noticed too much improvement year over year in regards to how they fail. They still break in the exact same dumb ways now as they did before. Sure they might generate correct syntactic code reliably now and it might even work. But they still consistently fail to grok the underlying reasoning for things existing.

But I am writing my own versions of these agentic systems to use for some rote menial stuff.

mexicocitinluez · 2025-06-26T17:21:52 1750958512

So you werent doing in depth reviews before? Are these people you know? And now you just don't trust them because they include a tool on their workflow?

globnomulous · 2025-06-26T15:43:27 1750952607

> It does amaze me that the people who don't use these tools seem to have the most to say about them.

You're kidding, right? Most people who don't use the tools and write about it are responding to the ongoing hype train -- a specific article, a specific claim, or an idea that seems to be gaining acceptance or to have gone unquestioned among LLM boosters.

I recently watched a talk by Andrei Karpathy. So much in it begged for a response. Google Glass was "all the rage" in 2013? Please. "Reading text is laborious and not fun. Looking at images is fun." You can't be serious.

Someone recently shared on HN a blog post explaining why the author doesn't use LLMs. The justification for the post? "People keep asking me."

mexicocitinluez · 2025-06-26T17:13:41 1750958021

Being asked if I'm kidding by the person comparing Google glasses to machine learning algorithms is pretty funny ngl.

And the "I don't use these tools and never will" sentiment is rampant in the tech community right now. So yes, I am serious.

Youre not talking about the blog post that completely ignored agentless uses are you? The one that came to the conclusion LLMs arent useful despite only using a subset of its features?

bluefirebrand · 2025-06-26T17:25:05 1750958705

> And the "I don't use these tools and never will" sentiment is rampant in the tech community right now

So is the "These tools are game changers and are going to make all work obsolete soon" sentiment

Don't start pretending that AI boosters aren't everywhere in tech right now

I think the major difference I'm noticing is that many of the Boosters are not people who write any code. They are executives, managers, product owners, team leads, etc. Former Engineers maybe but very often not actively writing software daily

globnomulous · 2025-06-26T19:06:36 1750964796

> I think the major difference I'm noticing is that many of the Boosters are not people who write any code.

Plenty of current, working engineers who frequent and comment on Hacker News say they use LLMs and find them useful/'game changers,' I think.

Regardless, I think I agree overall: the key distinction I see is between people who like to read and write programs and people who just want to make some specific product. The former group generally treat LLMs as an unwelcome intrusion into the work they love and value. The latter generally welcome LLMs because the people selling them promise, in essence, that with LLMs you can skip the engineering and just make the product.

I'm part of the former group. I love reading code, thinking about it, and working with it. Meeting-based programming (my term for LLM-assisted programming) sounds like hell on earth to me. I'd rather blow my brains out than continue to work as a software engineer in a world where the LLM-booster dream comes true.

bluefirebrand · 2025-06-26T20:44:57 1750970697

> I'd rather blow my brains out than continue to work as a software engineer in a world where the LLM-booster dream comes true.

I feel the same way

But please don't. I promise I won't either. There is still a place for people like you and me in this world, it's just gonna take a bit more work to find it

Deal? :)

globnomulous · 2025-06-27T14:44:07 1751035447

Sounds good, thanks!

mexicocitinluez · 2025-06-26T21:07:26 1750972046

> So is the "These tools are game changers and are going to make all work obsolete soon" sentiment

Except we aren't talking about those people, are we? The blog post wans't about that.

> Don't start pretending that AI boosters aren't everywhere in tech right now

PLEASE tell me what I said that made you feel like you need to put words in my mouth. Seriously.

> I think the major difference I'm noticing is that many of the Boosters are not people who write any code

I write code every day. I just asked Claude to convert a Medicare mandated 30 page assessment to a printable version with CSS using Cottle in C# and it did it. I'd love to know why that sort of thing isn't useful.

globnomulous · 2025-06-26T18:56:10 1750964170

> Being asked if I'm kidding by the person comparing Google glasses to machine learning algorithms is pretty funny ngl.

I didn't draw the comparison. Karpathy, one of the most prominent LLM proponents on the planet -- the guy who invented the term 'vibe-coding' -- drew the comparison.[1]

> And the "I don't use these tools and never will" sentiment is rampant in the tech community right now. So yes, I am serious.

I think you misunderstood my comment -- or my comment just wasn't clear enough: I quoted the line "It does amaze me that the people who don't use these tools seem to have the most to say about them." and then I asked "You're kidding, right?" In other words, "you can't seriously believe that the nay-sayers 'always have the most to say.'" It's a ridiculous claim. Just about every naysayer 'think piece' -- whether or not it's garbage -- is responding to an overwhelming tidal wave of pro-LLM commentary and press coverage.

> Youre not talking about the blog post that completely ignored agentless uses are you? The one that came to the conclusion LLMs arent useful despite only using a subset of its features?

I'm referring to this one[2]. It's awful, smug, self-important, sanctimonious nonsense.

[1] https://www.youtube.com/watch?si=xF5rqWueWDQsW3FC&v=LCEmiRjP...

[2] https://news.ycombinator.com/item?id=44294633

mexicocitinluez · 2025-06-26T21:02:27 1750971747

I'm so confused as to why you took that so literally. I didn't literally mean that the nay-sayers are producing more words than the evangelists. It was a hyperbolic expression. And I wasn't JUST talking about the blog posts. I'm talking about ALL comments about it.

globnomulous · 2025-06-26T23:25:44 1750980344

Sure, that's fair, though tone is difficult both to communicate and to detect in writing. I have just the literal meaning of your words. And I'm a very literal-minded person. :)

mexicocitinluez · 2025-06-27T11:02:05 1751022125

Agreed. I am, too. So I get it.