More

jpollock · 2026-03-08T22:09:12 1773007752

Documentation rots a lot more quickly than the code - it doesn't need to be correct for the code to work. You are usually better off ignoring the comments (even more so the design document) and going straight to the code.

hinkley · 2026-03-08T22:13:14 1773007994

I maintain you’re either grossly misappropriating the time and energy of new and junior devs if this is the case on your project, or you have gone too long since hiring a new dev and your project is stagnating because of it.

New eyes don’t have the curse of knowledge. They don’t filter out the bullshit bits. And one of the advantages of creating reusable modules is you get more new eyes on your code regularly.

This may also be a place where AI can help. Some of the review tools are already calling us out on making the code not match the documentation.

habinero · 2026-03-09T07:55:12 1773042912

No, they're 100% correct. This has been my experience at every place I've worked at in SV, from startup to FAANG.

You write the code so you can scan it easily, and you build tools to help, and you ask for help when you need it, but you still gotta build that mental map out

jpollock · 2026-03-06T08:16:02 1772784962

The last time I tried AI, I tested it with a stopwatch.

The group used feature flags...

    if (a) {
       // new code
    } else {
       // old code
    }

    void testOff() {
       disableFlag(a);
       // test it still works
    }
    
    void testOn() {
        enableFlag(a);
        // test it still works
    }

However, as with any cleanup, it doesn't happen. We have thousands of these things lying around taking up space. I thought "I can give this to the AI, it won't get bored or complain."

I can do one flag in ~3minutes. Code edit, pr prepped and sent.

The AI can do one in 10mins, but I couldn't look away. It kept trying to use find/grep to search through a huge repo to find symbols (instead of the MCP service).

Then it ignored instructions and didn't clean up one or the other test, left unused fields or parameters and generally made a mess.

Finally, I needed to review and fix the results, taking another 3-5 minutes, with no guarantee that it compiled.

At that point, a task that takes me 3 minutes has taken me 15.

Sure, it made code changes, and felt "cool", but it cost the company 5x the cost of not using the AI (before considering the token cost).

Even worse, the CI/CD system couldn't keep up the my individual velocity of cleaning these up, using an automated tool? Yeah, not going to be pleasant.

However, I need to try again, everyone's saying there was a step change in December.

laserlight · 2026-03-06T09:42:43 1772790163

I did my own experiment with Claude Code vs Cursor tab completion. The task was to convert an Excel file to a structured format. Nothing fancy at all.

Claude Code took 4 hours, with multiple prompts. At the end, it started to break the previous fixes in favor of new features. The code was spaghetti. There was no way I could fix it myself or steer Claude Code into fixing it the right way. Either it was a dead-end or a dice roll with every prompt.

Then I implemented my own version with Cursor tab completion. It took the same amount of time, 4 hours. The code had a clear object-oriented architecture, with a structure for evolution. Adding a new feature didn't require any prompts at all.

As a result, Claude Code was worse in terms of productivity: the same amount of time, worse quality output, no possibility of (or at best very high cost of) code evolution.

thesamethrowawa · 2026-03-06T09:49:14 1772790554

Are you able to share your prompts to Claude Code? I assume not, they are probably not saved - but this genuinely surprised me, it seems like exactly the type of task an LLM would excel at (no pun intended!). What model were you using OOI?

laserlight · 2026-03-06T10:03:58 1772791438

> this genuinely surprised me

Me too. After listening to all the claims about Claude Code's productivity benefits, I was surprised to get the result I got.

I'm not able to share details of my work. I was using Claude Opus 4.5, if I recall correctly.

shinycode · 2026-03-06T09:51:18 1772790678

The exact same prompt ? Everything depends on the prompt and it’s different tools. These days the quality and what’s build around the prompt matters as much as the code. We can’t feed generic query.

sensanaty · 2026-03-06T12:26:36 1772799996

Similar happened to me just now. Claude whatever-is-the-latest-and-greatest, in Claude Code. I also tried out Windsurf's Arena Mode, with the same failure. To intercept the inevitable "holding it wrong" comments, we have all the AGENTS.md and RULES.md files and all the other snake oil you're told to include in the project. It has full context of the code, and even the ticket. It has very clear instructions on what to do (the kind of instructions I would trust an unpaid intern with, yet alone a tool marketed as the next coming of Cyber Jesus that we're paying for), in a chat with minimal context used up already. I manually review every command it runs, because I don't trust it running shell scripts unsupervised.

I wanted it to finish up some tests that I had already prefilled, basically all the AI had to do was convert my comments into the final assertions. A few minutes later of looping, I see it finishes and all tests are green.

A third of the tests were still unfilled, I guess left as an exercise for the reader. Another third was modified beyond what I told it to do, including hardcoding some things which made the test quite literally useless and the last third was fine, but because of all the miscellaneous changes it made I had to double check those anyways. This is about the bare minimum where I would expect these things to do good work, a simple take comment -> spit out the `assert()` block.

I ended up wasting more time arguing with it than if I had just done the menial task of filling out the tests myself. It sure did generate a shit ton of code though, and ran in an impressive looking loop for 5-10 minutes! And sure, the majority of the test cases were either not implemented or hardcoded so that they wouldn't actually catch a breakage, but it was all green!!

That's ultimately where this hype is leading us. It's a genuinely useful tool in some circumstances, but we've collectively lost the plot because untold billions have poured into these systems and we now have clueless managers and executives seeing "tests green -> code good" and making decisions based on that.

embedding-shape · 2026-03-06T10:50:50 1772794250

What model, what harness and about how long was your prompt to fire off this piece of work? All three matters a lot, but importantly missing from your experience.

jpollock · 2026-03-04T19:24:05 1772652245

Won't that show up in roi numbers?

praptak · 2026-03-05T10:19:01 1772705941

There is no base to compare against.

jpollock · 2026-03-04T19:23:00 1772652180

There are definite discontinuities in there. What works for a team of 5 is different to 50 is different to 500.

Even just taking fault incidence rates, assuming constant injection per dev hour...

jpollock · 2026-03-03T22:51:41 1772578301

If the llm is able to code it, there is enough training data that youight be better off in a different language that removes the boilerplate.

jpollock · 2026-02-26T23:52:12 1772149932

There are a couple of ways to figure out.

open a terminal (OSX/Linux) and type:

    man dup

open a browser window and search for:

    man dup

Both will bring up the man page for the function call.

To get recursive, you can try:

    man man unix

(the unix is important, otherwise it gives you manly men)

Bender · 2026-02-26T23:57:06 1772150226

otherwise it gives you manly men

That's only just after midnight [1][2]

[1] - https://www.youtube.com/watch?v=XEjLoHdbVeE

[2] - https://unix.stackexchange.com/questions/405783/why-does-man...

ifh-hn · 2026-02-27T09:04:41 1772183081

I love that this situation occured.

trashb · 2026-02-27T08:56:55 1772182615

you may also consider gnu info

  info dup

jpollock · 2026-02-20T23:11:03 1771629063

The severity of the DoS depends on the system being attacked, and how it is configured to behave on failure.

If the system is configured to "fail open", and it's something validating access (say anti-fraud), then the DoS becomes a fraud hole and profitable to exploit. Once discovered, this runs away _really_ quickly.

Treating DoS as affecting availability converts the issue into a "do I want to spend $X from a shakedown, or $Y to avoid being shaken down in the first place?"

Then, "what happens when people find out I pay out on shakedowns?"

staticassertion · 2026-02-20T23:25:55 1771629955

If the system "fails open" then it's not a DoS, it's a privilege escalation. What you're describing here is just a matter of threat modeling, which is up to you to perform and not a matter for CVEs. CVEs are local properties, and DoS does not deserve to be a local property that we issue CVEs for.

otabdeveloper4 · 2026-02-21T06:29:28 1771655368

You're making too much sense for a computer security specialist.

michaelt · 2026-02-20T23:27:44 1771630064

> If the system is configured to "fail open", and it's something validating access (say anti-fraud),

The problem here isn't the DoS, it's the fail open design.

jpollock · 2026-02-20T23:47:56 1771631276

If the majority of your customers are good, failing closed will cost more than the fraud during the anti-fraud system's downtime.

prmoustache · 2026-02-21T13:50:13 1771681813

If that is the mindset in your company, why even bother looking for vulnerabilities?

jpollock · 2026-02-22T08:51:13 1771750273

There is _always_ fraud, and you can't stop it all. All you can do is try to minimize the cost of the fraud.

There is an "acceptable" fraud rate from a payment processor. This explains why there are different rates for "card present" and "card not present" transactions, and why things like Apple Pay and Google Pay are popular with merchants.

everforward · 2026-02-21T14:31:17 1771684277

You are really running with scissors there. If anyone with less scrupulous morals notices, you’re an outage away from being in deep, deep shit.

The best case is having your credit card processing fees like quadruple, and the worst case is being in a regulated industry and having to explain to regulators why you knowingly allowed a ton of transactions with 0 due diligence.

TeMPOraL · 2026-02-22T12:12:56 1771762376

The concept of due diligence recognizes the limits, past which it becomes too much, or undue.

lazyasciiart · 2026-02-21T04:32:13 1771648333

Until any bad customer learns about the fail-open.

eru · 2026-02-21T11:09:50 1771672190

If bad actors learn about the fail-close, they can conceivably cause you more harm.

gopher_space · 2026-02-21T19:18:51 1771701531

This is a losing money vs. losing freedom situation.

eru · 2026-02-22T15:01:59 1771772519

Maybe. But for a company everything is fungible.

paulddraper · 2026-02-22T17:48:36 1771782516

Okay, then the “vulnerability” is de facto simply transitioning the system to an acceptable state.

TeMPOraL · 2026-02-22T12:11:24 1771762284

> Treating DoS as affecting availability converts the issue into a "do I want to spend $X from a shakedown, or $Y to avoid being shaken down in the first place?"

But that is what security is in the real world anyway. Once you move past the imaginary realms of crypto and secure coding that some engineers daydream in, the ultimate reality is always about "do I want to spend $X dealing with consequences of ${specific kind of atack}, or $Y on trying to prevent it" - and the answer is to consider how much $X is likely to be, and how much it'll be reduced by spending $Y, and only spending while the $Y < reduction in $X.

vasco · 2026-02-21T06:08:38 1771654118

> Treating DoS as affecting availability converts the issue into a "do I want to spend $X from a shakedown, or $Y to avoid being shaken down in the first place?"

> Then, "what happens when people find out I pay out on shakedowns?"

What do you mean? You pay to someone else than who did the DoS. You pay your way out of a DoS by throwing more resources at the problem, both in raw capacity and in network blocking capabilities. So how is that incentivising the attacker? Or did you mean some literal blackmailing??

jpollock · 2026-02-21T08:27:06 1771662426

Literal blackmailing, same as ransomware.

eru · 2026-02-21T11:09:02 1771672142

Also in eg C code, many exploits start out would only be a DoS, but can later be turned into a more dangerous attack.

staticassertion · 2026-02-21T14:32:49 1771684369

If you're submitting a CVE for a primitive that seems likely to be useful for further exploitation, mark it as such. That's not the case for ReDOS or the vast majority of DoS, it's already largely the case that you'd mark something as "privesc" or "rce" if you believe it provides that capability without necessarily having a full, reliable exploit.

CVEs are at the discretion of the reporter.

jpollock · 2026-02-08T03:17:45 1770520665

Have these people done the math on how many engineers they can hire in other countries for USD$200k/yr? If you choose the timezone properly, they will even work overnight (your time) and have things ready in the morning for you.

USD$200k is 3 engineers in New Zealand.

https://www.levels.fyi/t/software-engineer/locations/new-zea...

jpollock · 2026-02-06T23:19:50 1770419990

I have a different opinion. :) DevOps is great feedback to the engineering team.

Too many alarms or alarms at unsocial hours? The engineering team should feel that pain.

Too hard to push? The engineering team should feel that pain.

Strange hard to diagnose alarms? Yep, the engineering team should feel that pain!

The feedback is very important to keeping the opex costs under control.

However, I think the author and I have different opinions on what DevOps is. DevOps isn't a full time role. It's what the engineer does to get their software into production.

antonvs · 2026-02-07T01:35:46 1770428146

This sounds very adversarial to me. I’m glad our devops team doesn’t think like you.

jpollock · 2026-02-07T01:44:10 1770428650

In my career, DevOps was never a separate organization. It was a role assumed by the code owners. SRE (is it up, is the hardware working, is the network working?) was separate, and had different metrics.

Having separate teams makes it adversarial because both orgs end up reporting into separate hierarchies with independent goals.

Think about the metrics each team is measured on. Who resolves conflicts between them? How high up the org chart is it necessary to go to resolve the conflict? Can one team make different tradeoffs on code quality vs speed from another, or is it company-wide?

raw_anon_1111 · 2026-02-07T05:43:27 1770443007

If you have a “DevOps team” - they are operations and you aren’t getting any of the benefits of a DevOps mindset

antonvs · 2026-02-07T05:56:34 1770443794

Meh, real life is a bit more complicated than a manifesto.

raw_anon_1111 · 2026-02-07T06:10:30 1770444630

It’s not about just a manifesto, at the startup I worked for before getting into consulting 6 years ago - cloud + app dev - it was much more affective for the team who did the work, to create their own IAC based on a standard.

What’s the difference between a “DevOps team” in 2026 than “operations” in 2001?

antonvs · 2026-02-07T08:10:25 1770451825

The difference is what they do. Assisting other teams with creating fully automated build and test pipelines. Managing infrastructure using automated systems. Identifying issues in production systems that other teams should look at, down to a level of granularity that wasn’t really possible in 2001.

> affective

You mean “effective”.

raw_anon_1111 · 2026-02-07T10:13:00 1770459180

It very much was possible in 2001. In 2001 we automated updating and automating our 15 or so Windows job runners with Perl and the Win32:: module.

No large enterprise by 2001 was walking up to individual PCs and updating computers by walking around and sticking CDs/DVDs in each computer and they were definitely making sure our on prem SQL Server and later MySQL database wasn’t having issues using dashboards and alerts.

dilyevsky · 2026-02-08T04:16:55 1770524215

That was very much present in 2001 except it was two separate teams: qa and sysadmins

bobanrocky · 2026-02-06T23:32:24 1770420744

The only folks who like devops are those that haven’t touched anything else, or are scared to move out of that molehill. Try it once .. is my advice

esseph · 2026-02-07T01:02:00 1770426120

> The only folks who like devops are those that haven’t touched anything else, or are scared to move out of that molehill.

IDK I've been called everything from: SysOp, SysAdmin, Network Engineer, Systems Architect, Solutions Engineer, Sales Engineer, Platform Engineer, etc. Half of those at different companies are just "DevOps" depending on the org.

jpollock · 2026-02-07T00:14:49 1770423289

I think there are different definitions of DevOps.

I see a difference between a more definite operations team (SRE) vs an engineering team having responsibility for how their service works in production (DevOps).

DevOps is something that all teams should be doing - there's no point in writing code that spends it's life generating problems for customers or other teams, and having the problems arrive at the owners results in them being properly prioritized.

In smaller orgs, DevOps and SRE might be together, but it should still be a rotation instead of a fulltime role, and everyone should be doing it.

Engineers who don't do devops write code that looks like:

  if (should_never_happen) {
    log.error("owner=wombat@example.com it happened again");
  }

Where the one who does do devops writes code that avoids the error condition entirely (usually possible), or decides what the code should do in that situation (not log).

rirze · 2026-02-07T21:58:07 1770501487

It truly depends on the type of DevOps experience. I've avoided firefighting DevOps roles my career and I enjoy it. Having the space to step back and design intelligent dependent systems is satisfying.

jpollock · 2026-01-29T21:40:40 1769722840

Measurement and alerting is usually done in business metrics, not the causes. That way you catch classes of problems.

Not sure about expected loss, that's a decay rate?

But stuck jobs are via tasks being processed and average latency.