More

mcqueenjordan · 2025-06-18T02:13:32 1750212812

One of my favorite LLM uses is to feed it this essay, then ask it to assume the persona of the grug-brained developer and comment on $ISSUE_IM_CURRENTLY_DEALING_WITH. Good stress relief.

CactusRocket · 2025-06-18T13:45:55 1750254355

I am not very proficient with LLMs yet, but this sounds awesome! How do you do that, to "feed it this essay"? Do you just start the prompt with something like "Act like the Grug Brained Developer from this essay <url>"?

rm_-rf_slash · 2025-06-18T16:59:12 1750265952

Could put it in a ChatGPT project description or Cursor rules to avoid copy pasting every time.

mcqueenjordan · 2025-06-16T03:49:26 1750045766

I haven't read all the comments and I'm sure someone else made a similar point, but my first thought was the flip the direction of the statement: "Waymo rides cost more than Uber or Lyft /because/ people are willing to pay more".

mcqueenjordan · 2025-03-20T05:10:29 1742447429

Usually if you’re using it, it’s because you’re forced to.

In my experience, the best strategy is to minimize your use of it — call out to binaries or shell scripts and minimize your dependence on any of the GHA world. Makes it easier to test locally too.

sepositus · 2025-03-20T05:21:22 1742448082

This is what I do. I've written 90% of the logic into a Go binary and GitHub Actions just calls out to it at certain steps. It basically just leaves GHA doing the only thing it's decent at...providing a local UI for pipelines. The best part is you get unit tests, can dogfood the tool in its own pipeline, and can run stuff locally (by just having the CLI nearby).

noisy_boy · 2025-03-20T05:27:20 1742448440

Makes migrations easier too; better to let gitHub or gitlab etc to just be the platform to host source code and trigger events which you decide how to deal with. Your CI itself should be another source controlled repo that provides the features for the application code's thin CI layer to invoke and use. That allows you to be able to run your CI locally in a pretty realistic manner too.

I have done something similar with Jenkins and groovy CI library used by Jenkins pipeline. But it wasn't super simple since a lot of it assumed Jenkins. I wonder if there is a more cleaner open source option that doesn't assume any underlying platform.

raffraffraff · 2025-03-20T09:05:52 1742461552

> Usually if you’re using it, it’s because you’re forced to.

Like teams.

mcqueenjordan · 2025-03-03T06:58:39 1740985119

This is a silly extreme case, but it's kind of an absurd example of what happens when you live a life devoid of the principle of charity[1].

I think tons of interpersonal engineering issues boil down to a failure to apply this principle.

[1]: https://en.wikipedia.org/wiki/Principle_of_charity

mcqueenjordan · 2025-02-15T13:30:56 1739626256

> But I just checked and, unsurprisingly, 4o seems to do reasonably well at generating Semgrep rules? Like: I have no idea if this rule is actually any good. But it looks like a Semgrep rule?

This is the thing with LLMs. When you’re not an expert, the output always looks incredible.

It’s similar to the fluency paradox — if you’re not native in a language, anyone you hear speak it at a higher level than yourself appears to be fluent to you. Even if for example they’re actually just a beginner.

The problem with LLMs is that they’re very good at appearing to speak “a language” at a higher level than you, even if they totally aren’t.

tptacek · 2025-02-15T17:27:32 1739640452

Hold on, hold on. You're missing a step here.

I agree completely that an LLM's first attempt to write a Semgrep rule is likely as not to be horseshit. That's true of everything an LLM generates. But I'm talking about closed-loop LLM code generation. Unlike legal arguments and medical diagnoses, you can hook an LLM up to an execution environment and let it see what happens when the code it generates runs. It then iterates, until it has something that works.

Which, when you think about it, is how a lot of human-generated code gets written too.

So my thesis here does not depend on LLMs getting things right the first time, or without assistance.

bambax · 2025-02-15T20:45:28 1739652328

The problem is what one means by "works". Is it just that it runs without triggering exceptions here and there?

One has to know, and understand, what the code is supposed to be doing, to evaluate it. Or use tests.

But LLMs love to lie so they can't be trusted to write the tests, or even to report how the code they wrote passed the tests.

In my experience the way to use LLMs for coding is exactly the opposite: the user should already have very good knowledge of the problem domain as well as the language used, and just needs to have a conversation with someone on how to approach a specific implementation detail (or help with an obscure syntax quirk). Then LLMs can be very useful.

But having them directly output code for things one doesn't know, in a language one doesn't know either, hoping they will magically solve the problem by iterating in "closed loops", will result in chaos.

tptacek · 2025-02-15T21:58:20 1739656700

It clearly does not result in chaos. This is an "I believe my lying eyes" situation, where I can just see that I can get an agent-y LLM codegen setup to generate a sane-looking working app in a language I'm not fluent in.

The thing everyone thinks about with LLM codegen is hallucination. The biggest problem for LLMs with hallucination is that there are no guardrails; it can just say whatever. But an execution environment provides a ground truth: code works or it doesn't, a handler path generates an exception or it doesn't, a lint rule either compiles and generates workable output or it doesn't.

bambax · 2025-02-16T08:52:49 1739695969

> code works or it doesn't

It seems you're deliberately confusing "works" with "runs". They're different things.

danielbln · 2025-02-15T18:15:41 1739643341

That's also the problem with these conversations. Some people evaluate zero-shot promoted code oozing out of gpt-3.5, others plug Sonnet into an IDE with access to terminal, LSP, diagnostics etc. crunching through a problem in an agentic self improvement loop. Those two approaches will generate very different quality levels of code.

vlovich123 · 2025-02-15T20:52:17 1739652737

An LLM though doesn’t truly understand the goal AND it frequently gets into circular loops it can’t get out of when the solution escapes its capability rather than asking for help. Hopefully it’ll get fixed but some of this stuff is an architectural problem rather than just iterating on the transformer idea.

tptacek · 2025-02-15T22:00:24 1739656824

That's totally true, but it's also a small amount of Python code in the agent scaffolding to ensure that it bails on those kinds of loops. Meanwhile, for something like Semgrep, the status quo ante was essentially no Semgrep rules getting written at all (I believe the modal Semgrep user just subscribes to existing rule repositories). If a closed-loop LLM setup can successfully generate Semgrep rules for bug patterns even 5% of the time, that is a material win, and a win that comes at very little cost.

mcqueenjordan · 2025-02-16T02:05:55 1739671555

Yeah, I more or less agree about the closed loop part and the overall broader point the article was making in this context — that it may be a useful use case. I think it’s likely that process creates a lot of horseshit that passes through the process, but that might still be better than nothing for semgrep rules.

I only came down hard on that quote out of context because it felt somewhat standalone and I want to broadcast this “fluency paradox” point a bit louder because I keep running into people who really need to hear it.

I know you know what’s up.

mcqueenjordan · 2024-11-26T02:33:56 1732588436

Reliability is hard when your volume is (presumably) scaling geometrically.

paxys · 2024-11-26T02:54:25 1732589665

Can't use the "reliability is hard" excuse when you are quite literally in the business of selling reliability.

mcqueenjordan · 2024-11-26T03:11:11 1732590671

It’s just not that big of a mystery. It’s not an excuse; it’s just true. Also, they’re not especially selling reliability as much as they’re selling small geo-distributed deployments.

mcqueenjordan · 2024-11-21T07:48:55 1732175335

Based on the ole' joke about outfitting custom planes, "If you want to do anything to a plane... /anything/..., it's 250. New coffee machine? 250k. Rotate the sofa? 250k." -- $149,072 for a soap dispenser might well be a screaming deal.

mcqueenjordan · 2024-10-31T08:43:16 1730364196

Transcript link here if you prefer text: http://softwareengineeringdaily.com/wp-content/uploads/2024/...

tialaramex · 2024-10-31T08:56:01 1730364961

Note that although this is a really good transcript (remember that awful auto-generated transcript for an interview (podcast maybe? I don't recall all the details) with tptacek way back? Not like that) it isn't actually written by a Rust or C++ programmer (or if it was they aren't paying attention) so e.g. it says "mute" because that's how you pronounce the keyword "mut" in Rust, just as C++ people often pronounce their "char" keyword "car".

lifthrasiir · 2024-10-31T09:15:40 1730366140

That does seem to be also auto-generated; "Graden" instead of Graydon, "Rust Go-ish" without a comma, "Steve, of adding" with an excess "of" etc. I would say it is very good nonetheless, but those errors don't really feel humane.

mcqueenjordan · 2024-10-26T17:09:50 1729962590

Most Japanese people do not use this term, and I'm fairly certain most Japanese people don't even really know the word. This is one of those "Big in Japan" things, except, uh, "Big outside Japan".

Source: live in Japan, have asked Japanese people around me if they know about this concept (that is popular in USA). Usually hear: へ〜、全然知らない。

mcqueenjordan · 2024-10-27T06:51:25 1730011885

The topic came up again and maybe this has been changing lately. I downgrade my above comment. I still think that it got popular in the U.S. first and then propagated back to Japan but ¯\_(ツ)_/¯.

mcqueenjordan · on June 19, 2024

I think it's a mix of:

1. Queues are actually used a lot, esp. at high scale, and you just don't hear about it.

2. Hardware/compute advances are outpacing user growth (e.g. 1 billion users 10 years ago was a unicorn; 1 billion users today is still a unicorn), but serving (for the sake of argument) 100 million users on a single large box is much more plausible today than 10 years ago. (These numbers are made up; keep the proportions and adjust as you see fit.)

3. Given (2), if you can get away with stuffing your queue into e.g. Redis or a RDBMS, you probably should. It simplifies deployment, architecture, centralizes queries across systems, etc. However, depending on your requirements for scale, reliability, failure (in)dependence, it may not be advisable. I think this is also correlated with a broader understanding that (1) if you can get away with out-of-order task processing, you should, (2) architectural simplicity was underrated in the 2010s industry-wide, (3) YAGNI.