More

simonw · 2026-01-20T23:04:45 1768950285

At first glance this looks like a credible set of calculations to me. Here's the conclusion:

> So, if I wanted to analogize the energy usage of my use of coding agents, it’s something like running the dishwasher an extra time each day, keeping an extra refrigerator, or skipping one drive to the grocery store in favor of biking there.

That's for someone spending about $15-$20 in a day on Claude Code, estimated at the equivalent of 4,400 "typical queries" to an LLM.

Aurornis · 2026-01-21T01:09:35 1768957775

Comparing it to running a refrigerator or the dishwasher is very relatable, as most people have at least one refrigerator without a second thought.

This is for someone using a lot of LLM tokens relative to the average customer of these companies.

nasmorn · 2026-01-21T15:58:41 1769011121

It also means I could offset my Claude usage with a single solar panel which costs basically nothing. A battery as well if I wanna code late

simonw · 2026-01-20T22:58:37 1768949917

This is a neat brute-force search system - it uses goroutines, one for each of the 1,200 books in the corpus, and has each one do a regex search against the in-memory text for that book.

Here's a neat trick I picked up from the source code:

    indices := fdr.rgx.FindAllStringSubmatchIndex(text, -1)

    for _, pair := range indices {
        start := pair[0]
        end := pair[1]
        leftStart := max(0, start-CONTEXT_LENGTH)
        rightEnd := min(end+CONTEXT_LENGTH, len(text))

        // TODO: this doesn't work with Unicode
        if start > 0 && isLetter(text[start-1]) {
            continue
        }

        if end < len(text) && isLetter(text[end]) {
            continue
        }

An earlier comment explains this:

    // The '\b' word boundary regex pattern is very slow. So we don't use it here and
    // instead filter for word boundaries inside `findConcordance`.
    // TODO: case-insensitive matching - (?i) flag (but it's slow)
    pattern := regexp.QuoteMeta(keyword)

So instead of `\bWORD\b` it does the simplest possible match and then checks to see if the character one index before the match and or one index after the matches are also letters. If they are it skips the match.

never_inline · 2026-01-21T08:30:45 1768984245

Spinning 1K goroutines per request doesn't feel right to me for some reason.

Isn't trigram search supposed to be better?

https://swtch.com/~rsc/regexp/regexp4.html

simonw · 2026-01-20T18:32:58 1768933978

Thanks for this, that was a really informative comment.

polyglotfacto · 2026-01-20T18:40:52 1768934452

You're welcome; big fan of your blog and a former Django dev myself.

Just made some last edits above so not sure which version you saw. I toned it down a bit and clarified some stuff...

simonw · 2026-01-20T13:24:56 1768915496

Anthropic is a dictionary word already: https://www.merriam-webster.com/dictionary/anthropic

  of or relating to human beings
  or the period of their existence
  on earth

verdverm · 2026-01-20T14:46:40 1768920400

Anthro is the root from which many words come: https://www.etymonline.com/word/anthro-

Anthropocene (time period), Anthropology (study of), Anthropomorphic (giving human attributes), Anthropocentric (centered on humans)

"Anthropic" is and adjective used with multiple of these

1. Of or relating to humans or the era of human life; anthropocene. 2. Concerned primarily with humans; anthropocentric.

simonw · 2026-01-20T02:56:02 1768877762

One of the big open questions for me right now concerns how library dependencies are used.

Most of the big ones are things like skia, harfbuzz, wgpu - all totally reasonable IMO.

The two that stand out for me as more notable are html5ever for parsing HTML and taffy for handling CSS grids and flexbox - that's vendored with an explanation of some minor changes here: https://github.com/wilsonzlin/fastrender/blob/19bf1036105d4e...

Taffy a solid library choice, but it's probably the most robust ammunition for anyone who wants to argue that this shouldn't count as a "from scratch" rendering engine.

I don't think it detracts much if at all from FastRender as an example of what an army of coding agents can help a single engineer achieve in a few weeks of work.

sealeck · 2026-01-20T03:01:55 1768878115

I think the other question is how far away this is from a "working" browser. It isn't impossible to render a meaningful subset of HTML (especially when you use external libraries to handle a lot of this). The real difficulty is doing this (a) quickly, (b) correctly and (c) securely. All of those are very hard problems, and also quite tricky to verify.

I think this kind of approach is interesting, but it's a bit sad that Cursor didn't discuss how they close the feedback loop: testing/verification. As generating code becomes cheaper, I think effort will shift to how we can more cheaply and reliably determine whether an arbitrary piece of code meets a desired specification. For example did they use https://web-platform-tests.org/, fuzz testing (e.g. feed in random webpages and inform the LLM when the fuzzer finds crashes), etc? I would imagine truly scaling long-running autonomous coding would have an emphasis on this.

Of course Cursor may well have done this, but it wasn't super deeply discussed in their blog post.

I really enjoy reading your blog and it would be super cool to see you look at approaches people have to ensuring that LLM-produced code is reliable/correct.

simonw · 2026-01-20T03:09:11 1768878551

Yeah, I'm hoping they publish a lot more about this project! It deserves way more then the few sentences they've shared about it so far.

cousinbryce · 2026-01-20T16:00:31 1768924831

I’m interested to see how much more they know about the project

polyglotfacto · 2026-01-21T13:44:45 1769003085

I think the current approach is simply not scalable to a working browser ever.

To leverage AI to build a working browser you would imo need the following:

- A team of humans with some good ideas on how to improve on existing web engines.

- A clear architectural story written not by agents but by humans. Architecture does not mean high-level diagrams only. At each level of abstraction, you need humans to decide what makes sense and only use the agent to bang out slight variations.

- A modular and human-overseen agentic loop approach: one agent can keep running to try to fix a specific CSS feature(like grid), with a human expert reviewing the work at some interval(not sure how fine-grained it should be). This is actually very similar to running an open-source project: you have code owners and a modular review process, not just an army of contributor committing whatever they want. And a "judge agent" is not the same thing as a human code owner as reviewer.

Example on how not to do it: https://github.com/wilsonzlin/fastrender/blob/19bf1036105d4e...

This rendering loop architecture makes zero sense, and it does not implement web standards.

> in the HTML Standard, requestAnimationFrame is part of the frame rendering steps (“update the rendering”), which occur after running a task and performing a microtask checkpoint

> requestAnimationFrame callbacks run on the frame schedule, not as normal tasks.

This is BS: "update the rendering" is specified as just another task, which means it needs to be followed by a microtask checkpoint. See https://html.spec.whatwg.org/multipage/#event-loop-processin...

Following the spec doesn't mean you cannot optimize rendering tasks in some way vs other tasks in your implementation, but the above is not that, it's classic AI bs.

Understanding Web standards and translating them into an implementation requires human judgement.

Don't use an agent to draft your architecture; an expert in web standards with a interest in agentic coding is what is required.

Message to Cursor CEO: next time, instead of lighting up those millions on fire, reach out to me first: https://github.com/gterzian

ontouchstart · 2026-01-22T11:57:48 1769083068

How much effort would it take GenAI to write a browser/engine from scratch for GenAI to consume (and generate) all the web artifacts generated by human and GenAI? (This only needs to work in headless CI.)

How much effort would it take for a group of humans to do it?

mwcampbell · 2026-01-20T09:54:23 1768902863

I was gratified to learn that the project used my own AccessKit for accessibility (or at least attempted to; I haven't verified if it actually works at all; I doubt it)... then horrified to learn that it used a version that's over 2 years old.

embedding-shape · 2026-01-20T10:17:09 1768904229

For me, the biggest open question is currently "How autonomous is 'autonomous'?" because the commits make it clear there were multiple actors involved in contributing to the repository, and the timing/merges make it seem like a human might have been involved with choosing what to merge (but hard to know 100%) and also making smaller commits of their own. I'm really curious to understand what exactly "It ran uninterrupted for one week" means, which was one of Cursor's claims.

I've reached out to the engineer who seemed to have run the experiment, who hopefully can shed some more light on it and (hopefully) my update to https://news.ycombinator.com/item?id=46646777 will include the replies and more investigations.

shubhamjain · 2026-01-20T05:07:14 1768885634

Why attempt something that has abundant number of libraries to pick and choose? To me, however impressive it is, 'browser build from scratch' simply overstates it. Why not attempt something like a 3D game where it's hard to find open source code to use?

Banditoz · 2026-01-20T05:10:46 1768885846

Is something like a 3D game engine even hard to find source code for? There's gotta lots of examples/implementations scattered around.

XenophileJKO · 2026-01-20T08:47:54 1768898874

There are a lot of examples out there. Funny that you mention this. I literally just last night started a "play" project having Claude Code build a 3D web assembly/webgl game using no frameworka. It did it, but it isn't fun yet.

I think the current models are at a capability level that could create a decent 3D game. The challenges are creating graphic assets and debugging/Qa. The debugging problem is you need to figure out a good harness to let the model understand when something is working, or how it is failing.

cheevly · 2026-01-20T07:45:18 1768895118

Assets are very hard to produce and largely unsolved by AI at the moment.

fulafel · 2026-01-20T20:09:01 1768939741

There's AI based 3d asset generation tools around. For example https://www.meshy.ai/ https://hyper3d.ai/ https://www.sloyd.ai/

qingcharles · 2026-01-20T16:47:54 1768927674

This is definitely correct. I had a dream about a new video game the other day, woke up and Gemini one-shotted the game, but the characters are janky as hell because it has made them from whole cloth.

What it should have been willing to do is go off and look for free external assets on the Web that it could download and integrate.

fulafel · 2026-01-20T20:04:51 1768939491

There's many open source ones around.

Also graphics acceleration makes it hard to do from scratch rather than using using the 3D APIs but I guess you could in principle go bare iron on hardware that has published specs such as AMD, or just do software only rendering.

janoelze · 2026-01-20T03:18:26 1768879106

Any views on the nature of "maintainability" shifting now? If a fleet of agents demonstrated the ability to bootstrap a project like that, would that be enough indication to you that orchestration would be able to carry the code base forward? I've seen fully llm'd codebases hit a certain critical weight where agents struggled to maintain coherent feature development, keeping patterns aligned, as well as spiralling into quick fixes.

simonw · 2026-01-20T03:26:58 1768879618

Almost no idea at all. Coding agents are messing with all 25+ years of my existing intuitions about what features cost to build and maintain.

Features that I'd normally never have considered building because they weren't worth the added time and complexity are now just a few well-structured prompts away.

But how much will it cost to maintain those features in the future? So far the answer appears to be a whole lot less than I would previously budget for, but I don't have any code more than a few months old that was built ~100% by coding agents, so it's way too early to judge how maintenance is going to work over a longer time period.

htrp · 2026-01-20T18:14:41 1768932881

I'm seeing a lot of duplication in our AI coded repos that is getting to the point of being problematic to maintain.

visarga · 2026-01-20T12:26:29 1768911989

> But how much will it cost to maintain those features in the future?

Very little if they have good specs and tests.

brianjeong · 2026-01-20T04:11:51 1768882311

I think there's a somewhat valid perspective that the Nth+1 model can simply clean up the previous models mess.

Essentially a bet that the rate of model improvement is going to be faster than the rate of decay from bad coding.

Now this hurts me personally to see as someone who actually enjoys having quality code but I don't see why it doesn't have a decent chance of holding

Deevian · 2026-01-20T17:02:03 1768928523

They demonstrated the ability to bootstrap... "something". There's no maintainability to the output of the experiment.

teaearlgraycold · 2026-01-20T05:21:02 1768886462

It looks like JS execution is outsourced to QuickJS?

simonw · 2026-01-20T13:07:57 1768914477

No, it has its own JS implementation: https://github.com/wilsonzlin/fastrender/tree/main/vendor/ec...

See also: https://news.ycombinator.com/item?id=46650998

simonw · 2026-01-19T23:51:38 1768866698

I went looking for a single Markdown file I could dump into an LLM to "teach" it the language and found this one:

https://github.com/jordanhubbard/nanolang/blob/main/MEMORY.m...

Optimistically I dumped the whole thing into Claude Opus 4.5 as a system prompt to see if it could generate a one-shot program from it:

  llm -m claude-opus-4.5 \
    -s https://raw.githubusercontent.com/jordanhubbard/nanolang/refs/heads/main/MEMORY.md \
    'Build me a mandelbrot fractal CLI tool in this language' 
   > /tmp/fractal.nano

Here's the transcript for that. The code didn't work: https://gist.github.com/simonw/7847f022566d11629ec2139f1d109...

So I fired up Claude Code inside a checkout of the nanolang and told it how to run the compiler and let it fix the problems... which DID work. Here's that transcript:

https://gisthost.github.io/?9696da6882cb6596be6a9d5196e8a7a5...

And the finished code, with its output in a comment: https://gist.github.com/simonw/e7f3577adcfd392ab7fa23b1295d0...

So yeah, a good LLM can definitely figure out how to use this thing given access to the existing documentation and the ability to run that compiler.

e12e · 2026-01-20T02:22:21 1768875741

Oh, wow. I thought the control flow from the readme was a little annoying with the prefix -notation for bigger/smaller than;

    # Control flow
    if (> x 0) {
      (println "positive")
    } else {
      (println "negative or zero")
    }

But that's nothing compared to the scream for a case/switch-statement in the Mandelbrot example...

    # Gradient: " .:-=+*#%@"
        let gradient: string = " .:-=+*#%@"
        let gradient_len: int = 10
        let idx: int = (/ (* iter gradient_len) max_iter)
        if (>= idx gradient_len) {
            return "@"
        } else {
            if (== idx 0) {
                return " "
            } else {
                if (== idx 1) {
                    return "."
                } else {
                    if (== idx 2) {
                        return ":"
                    } else {
                        if (== idx 3) {
                            return "-"
                        } else {
                            if (== idx 4) {
                                return "="
                            } else {
                                if (== idx 5) {
                                    return "+"
                                } else {
                                    if (== idx 6) {
                                        return "*"
                                    } else {
                                        if (== idx 7) {
                                            return "#"
                                        } else {
                                            if (== idx 8) {
                                                return "%"
                                            } else {
                                                return "@"
                                            }
                                        }

antonvs · 2026-01-20T06:08:43 1768889323

> scream for a case/switch-statement

Maybe I’m missing some context, but all that actually should be needed in the top-level else block is ‘gradient[idx]’. Pretty much anything else is going to be longer, harder to read, and less efficient.

e12e · 2026-01-20T08:30:53 1768897853

True, with early return - there's no need to actually nest with else.

Logically this still would be a case/switch though...

vidarh · 2026-01-20T09:30:01 1768901401

The point was that logically it would be an array lookup by index.

There's no need for any conditional construct here whatsoever.

You'll note it has already constructed a string in the right order to do that, but then copped out with the if-else.

e12e · 2026-01-20T13:09:06 1768914546

True enough. On that note, I had a look at the language reference - there's arrays - but also this:

    (char_at s index)        # Get ASCII value at index (0-based)
    (string_from_char code)  # Create string from ASCII value

So, you can pluck a character... From an UTF-8 string? What if the rendering used multibyte characters?

vidarh · 2026-01-20T17:14:08 1768929248

Well, we can see the string, and we can see that is uses plain ASCII.

e12e · 2026-01-20T17:24:42 1768929882

In this case, sure. But what if we shifted to rendering with emojis or whatnot. What would the first ASCII character of the string be?

vidarh · 2026-01-21T14:40:03 1769006403

If you anticipate that need, you just store the gradient as an array of strings, and you still then only need a trivial lookup.

e12e · 2026-01-22T01:30:57 1769045457

I was more commenting on the language design here; the idea of indexing into a UTF-8 string and returning an ASCII character. What does the index count? Bytes? There doesn't seem to be a way to get UTF-8 characters from strings?

Ed: There seems to be an UTF-8 library:

https://github.com/jordanhubbard/nanolang/tree/main/modules/...

kamaal · 2026-01-20T02:29:02 1768876142

If you are planning to write so many if else statements. You might as well write Prolog.

mortarion · 2026-01-20T10:02:41 1768903361

I mean for all intents and purposes this language is designed for use by LLM's, not humans, and the AI probably won't complain that a switch-case statement is missing. ;)

nodja · 2026-01-19T23:59:22 1768867162

I think you need to either feed it all of ./docs or give your agent access to those files so it can read them as reference. The MEMORY.md file you posted mentions ./docs/CANONICAL_STYLE.md and ./docs/LLM_CORE_SUBSET.md and they in turn mention indirectly other features and files inside the docs folder.

simonw · 2026-01-20T00:14:34 1768868074

Yeah, I think you're right about that.

The thing that really unlocked it was Claude being able to run a file listing against nanolang/examples and then start picking through the examples that were most relevant to figuring out the syntax: https://gisthost.github.io/?9696da6882cb6596be6a9d5196e8a7a5...

hahahahhaah · 2026-01-20T01:07:50 1768871270

But are you losing horsepower of the LLM available to problem solving on a given task by doing so?

simonw · 2026-01-20T02:32:29 1768876349

Maybe a little, but Claude has 200,000 tokens these days and GPT-5.2 has 400,000 - there's a lot of space.

hahahahhaah · 2026-01-20T12:03:51 1768910631

True. You would know this better but are you also burning "attention" by giving it a new language? Rather than use its familiar Python pathways it needs to attend more to generate the unseen language. It needs to KV across from the language spec to the language to the goal. Rather than just speak the Python or JS it is uses to speaking.

simonw · 2026-01-19T23:26:32 1768865192

I hadn't heard of Halvar Flake but evidently he's a well respected figure in security - https://ringzer0.training/advisory-board-thomas-dullien-halv... mentions "After working at Google Project Zero, he cofounded startup optimyze, which was acquired by Elastic Security in 2021"

His co-founder on optimyze was Sean Heelan, the author of the OP.

tptacek · 2026-01-19T23:31:23 1768865483

Yes, Halvar Flake is pretty well respected in exploit dev circles.

0xbadcafebee · 2026-01-20T06:19:53 1768889993

Sure he can write exploits, but can he cool a beer really fast?

simonw · 2026-01-19T23:22:47 1768864967

My hunch is that the dumbasses submitting those reports were't actually using coding agent harnesses at all - they were pasting blocks of code into ChatGPT or other non-agent-harness tools and asking for vulnerabilities and reporting what came back.

An "agent harness" here is software that directly writes and executes code to test that it works. A vulnerability reported by such an agent harness with included proof-of-concept code that has been demonstrated to work is a different thing from an "exploit" that was reported by having a long context model spit out a bunch of random ideas based purely on reading the code.

I'm confident you can still find dumbasses who can mess up at using coding agent harnesses and create invalid, time wasting bug reports. Dumbasses are gonna dumbass.

staticassertion · 2026-01-21T02:43:30 1768963410

I strongly suspect the same thing - that they weren't using agents at all in the reports we've seen, let alone agents with instructions on how to verify a viable attack, a threat model, etc.

simonw · 2026-01-19T22:27:19 1768861639

Why can't they both be true?

The quality of output you see from any LLM system is filtered through the human who acts on those results.

A dumbass pasting LLM generated "reports" into an issue system doesn't disprove the efforts of a subject-matter expert who knows how to get good results from LLMs and has the necessary taste to only share the credible issues it helps them find.

protocolture · 2026-01-19T23:05:54 1768863954

Theres no filtering mentioned in the OP article. It claims GPT only created working useful exploits. If it can do that, it could also submit those exploits as perfectly as bug reports?

moyix · 2026-01-19T23:20:06 1768864806

There is filtering mentioned, it's just not done by a human:

> I have written up the verification process I used for the experiments here, but the summary is: an exploit tends to involve building a capability to allow you to do something you shouldn’t be able to do. If, after running the exploit, you can do that thing, then you’ve won. For example, some of the experiments involved writing an exploit to spawn a shell from the Javascript process. To verify this the verification harness starts a listener on a particular local port, runs the Javascript interpreter and then pipes a command into it to run a command line utility that connects to that local port. As the Javascript interpreter has no ability to do any sort of network connections, or spawning of another process in normal execution, you know that if you receive the connect back then the exploit works as the shell that it started has run the command line utility you sent to it.

It is more work to build such "perfect" verifiers, and they don't apply to every vulnerability type (how do you write a Python script to detect a logic bug in an arbitrary application?), but for bugs like these where the exploit goal is very clear (exec code or write arbitrary content to a file) they work extremely well.

simonw · 2026-01-19T23:17:59 1768864679

The OP is the filtering expert.

anonymous908213 · 2026-01-19T22:48:31 1768862911

They can't both be true if we're talking about the premise of the article, which is the subject of the headline and expounded upon prominently in the body:

  The Industrialisation of Intrusion

  By ‘industrialisation’ I mean that the ability of an organisation to complete a task will be limited by the number of tokens they can throw at that task. In order for a task to be ‘industrialised’ in this way it needs two things:

  An LLM-based agent must be able to search the solution space. It must have an environment in which to operate, appropriate tools, and not require human assistance. The ability to do true ‘search’, and cover more of the solution space as more tokens are spent also requires some baseline capability from the model to process information, react to it, and make sensible decisions that move the search forward. It looks like Opus 4.5 and GPT-5.2 possess this in my experiments. It will be interesting to see how they do against a much larger space, like v8 or Firefox.
  The agent must have some way to verify its solution. The verifier needs to be accurate, fast and again not involve a human.

"The results are contigent upon the human" and "this does the thing without a human involved" are incompatible. Given what we've seen from incompetent humans using the tools to spam bug bounty programs with absolute garbage, it seems the premise of the article is clearly factually incorrect. They cite their own experiment as evidence for not needing human expertise, but it is likely that their expertise was in fact involved in designing the experiment[1]. They also cite OpenAI's own claims as their other piece of evidence for this theory, which is worth about as much as a scrap of toilet paper given the extremely strong economic incentives OpenAI has to exaggerate the capabilities of their software.

[1] If their experiment even demonstrates what it purports to demonstrate. For anyone to give this article any credence, the exploit really needs to be independently verified that it is what they say it is and that it was achieved the way they say it was achieved.

adw · 2026-01-19T23:44:53 1768866293

What this is saying is "you need an objective criterion you can use as a success metric" (aka a verifiable reward in RL terms). "Design of verifiers" is a specific form of domain expertise.

This applies to exploits, but it applies _extremely_ generally.

The increased interest in TLA+, Lean, etc comes from the same place; these are languages which are well suited to expressing deterministic success criteria, and it appears that (for a very wide range of problems across the whole of software) given a clear enough, verifiable enough objective, you can point the money cannon at it until the problem is solved.

The economic consequences of that are going to be very interesting indeed.

IanCal · 2026-01-19T23:34:20 1768865660

A few points:

1. I think you have mixed up assistance and expertise. They talk about not needing a human in the loop for verification and to continue search but not about initial starts. Those are quite different. One well specified task can be attempted many times, and the skill sets are overlapping but not identical.

2. The article is about where they may get to rather than just what they are capable of now.

3. There’s no conflict between the idea that 10 parallel agents of the top models can mostly have one that successfully exploits a vulnerability - gated on an actual test that the exploit works - with feedback and iteration BUT random models pointed at arbitrary code without a good spec and without the ability to run code, and just run once, will generate lower quality results.

GaggiX · 2026-01-19T22:58:24 1768863504

After setting the environment and the verifier you can spawn as many agents as you want until the conditions are met, this is only possible because they run without human assistance, that's the "industrialisation".

simonw · 2026-01-19T23:19:41 1768864781

My expectation is that any organization that attempts this will need subject matter experts to both setup and run the swarm of exploit finding agents for them.

simonw · 2026-01-19T22:19:25 1768861165

> In the hardest task I challenged GPT-5.2 it to figure out how to write a specified string to a specified path on disk, while the following protections were enabled: address space layout randomisation, non-executable memory, full RELRO, fine-grained CFI on the QuickJS binary, hardware-enforced shadow-stack, a seccomp sandbox to prevent shell execution, and a build of QuickJS where I had stripped all functionality in it for accessing the operating system and file system. To write a file you need to chain multiple function calls, but the shadow-stack prevents ROP and the sandbox prevents simply spawning a shell process to solve the problem. GPT-5.2 came up with a clever solution involving chaining 7 function calls through glibc’s exit handler mechanism.

Yikes.

ahartmetz · 2026-01-20T12:02:10 1768910530

Maybe we can remove mitigations. Every exploit you see is: First, find a vulnerability (the difficult part). Then, drill through five layers of ultimately ineffective "mitigations" (the tedious but almost always doable part).

Probabilistic mitigations work against probabilistic attacks, I guess - but exploit writers aren't random, they are directed, and they find the weaknesses.

GaggiX · 2026-01-20T12:33:46 1768912426

The vulnerability was found by Opus:

"This is true by definition as the QuickJS vulnerability was previously unknown until I found it (or, more correctly: my Opus 4.5 vulnerability discovery agent found it)."

atomic128 · 2026-01-20T18:21:23 1768933283

Number 6, explained 3 years ago:

https://github.com/nobodyisnobody/docs/blob/main/code.execut...

Original publication in 2017:

https://m101.github.io/binholic/2017/05/20/notes-on-abusing-...

ahartmetz · 2026-01-20T13:30:47 1768915847

Makes little difference, whoever or whatever finds the initial exploit will also do the busywork of working around mitigations. (Techniques to work around mitigations are initially not busywork, but as soon as somehow has found a working principle, it seems to me that it becomes busywork)

staticassertion · 2026-01-20T21:40:29 1768945229

Most mitigations just flat out do not attempt to help against "arbitrary read/write". The LLM didn't just find "a vuln" and then work through the mitigations, it found the most powerful possible vulnerability.

Lots of vulnerabilites get stopped dead by these mitigations. You almost always need multiple vulnerabilities tied together, which relies on a level of vulnerability density that's tractable. This is not just busywork.

ahartmetz · 2026-01-21T10:13:12 1768990392

Maybe I've been fooled by survivorship bias? You don't read much about the the vulnerabilities that ultimately weren't exploitable.

Reports about the ones that are exploitable usually read to me like after finding an entry, the attacker reaches into the well-stocked toolbox of post-entry techniques (return-oriented programming, nop slides, return to libc...) to do the rest of the work.

titzer · 2026-01-20T18:25:14 1768933514

There are so many holes at the bottom of the machine code stack. In the future we'll question why we didn't move to WASM as the universal executable format sooner. Instead, we'll try a dozen incomplete hardware mitigations first to try to mitigate backwards crap like overwriting the execution stack.

shakna · 2026-01-20T20:23:56 1768940636

Escaping the sandbox has been plenty doable over the years. [0]

WASM adds a layer, but the first thing anyone will do is look for a way to escape it. And unless all software faults and hardware faults magically disappear, it'll still be a constant source of bugs.

Pitching a sandbox against ingenuity will always fail at some point, there is no panacea.

[0] https://instatunnel.substack.com/p/the-wasm-breach-escaping-...

verall · 2026-01-20T19:12:33 1768936353

> In the future we'll question why we didn't move to WASM as the universal executable format sooner

I hope not, my laptop is slow enough as it is.

rvz · 2026-01-19T23:48:42 1768866522

Tells you all you need to know around how extremely weak a C executable like QuickJS is for LLMs to exploit. (If you as an infosec researcher prompt them correctly to find and exploit vulnerabilities).

> Leak a libc Pointer via Use-After-Free. The exploit uses the vulnerability to leak a pointer to libc.

I doubt Rust would save you here unless the binary has very limited calls to libc, but would be much harder for a UaF to happen in Rust code.

cookiengineer · 2026-01-20T00:18:53 1768868333

The reason I value Go so much is because you have a fat dependency free binary that's just a bunch of syscalls when you use CGO_ENABLED=0.

Combine that with a minimal docker container and you don't even need a shell or anything but the kernel in those images.

akoboldfrying · 2026-01-20T00:30:06 1768869006

Why would statically linking a library reduce the number of vulnerabilities in it?

AFAICT, static linking just means the set of vulnerabilities you get landed with won't change over time.

cookiengineer · 2026-01-20T00:44:08 1768869848

> Why would statically linking a library reduce the number of vulnerabilities in it?

I use pure go implementations only, and that implies that there's no statically linked C ABI in my binaries. That's what disabling CGO means.

akoboldfrying · 2026-01-20T01:37:31 1768873051

What I mean is: There will be bugs* in that pure Go implementation, and static linking means you're baking them in forever. Why is this preferable to dynamic linking?

* It's likely that C implementations will have bugs related to dynamic memory allocation that are absent from the Go implementation, because Go is GCed while C is not. But it would be very surprising if there were no bugs at all in the Go implementation.

tptacek · 2026-01-20T01:54:45 1768874085

They're prioritizing memory corruption vulnerabilities, is the point of going to extremes to ensure there's no compiled C in their binaries.

cookiengineer · 2026-01-20T05:38:05 1768887485

It would be nice if there was something similar to the ebpf verifier, but for static C, so that loop mistakes, out of boundary mistakes and avoidable satisfiability problems are caught right in the compile step.

The reason I'm so avoidant to using C libraries at all cost is that the ecosystem doesn't prioritize maintenance or other forms of code quality in its distribution. If you have to go to great lengths of having e.g. header only libraries, then what's the point of using C99/C++ at all? Back when conan came out I had hopes for it, but meanwhile I gave up on the ecosystem.

Don't get me wrong, Rust is great for its use cases, too. I just chose the mutex hell as a personal preference over the wrapping hell.

supriyo-biswas · 2026-01-20T08:44:35 1768898675

I believe this is fil-c[1].

[1] https://fil-c.org/

saagarjha · 2026-01-20T09:31:37 1768901497

What do you consider to be a loop mistake?

cookiengineer · 2026-01-21T06:12:33 1768975953

Everything that is a "too clever" state management in an iterative loop.

Examples that come to mind: queues that are manipulated inside a loop, slice calls that forget to do length-- of the variable they set in the begin statement, char arrays that are overflowing because the loop doesn't check the length at the correct position in the code, conditions that are re-set inside the loop, like a min/max boundary that is set by an outer loop.

This kind of stuff. I guess you could argue these are memory safety issues. I've seen so crappy loop statements that the devs didn't bother to test it because they still believed they were "smart code", even after sending the devs a PoC that exploited their naive parser assumptions.

In Go I try to write clear, concise and "dumb" code so that a future me can still read it after years of not touching it. That's what I understand under Go's maintainability idiom, I suppose.

underdeserver · 2026-01-20T05:32:23 1768887143

You can have memory corruption in pure Go code, too.

staticassertion · 2026-01-21T03:19:40 1768965580

And in Rust (yes, safe Rust can have memory safety vulnerabilities). Who cares? They basically don't happen in practice.

tptacek · 2026-01-20T05:35:40 1768887340

Uh huh. That's where all the Go memory corruption vulnerabilities come from!

cookiengineer · 2026-01-20T05:39:26 1768887566

Nobody claimed otherwise. You're interacting with a kernel that invented its own programming language based on macros, after all, instead of relying on a compiler for that.

What could go wrong with this, right?

/s

jerf · 2026-01-20T15:04:44 1768921484

About a year ago I had some code I had been working on for about a year subject to a pretty heavy-duty security review by a reputable review company. When they asked what language I implemented it in and I told them "Go", they joked that half their job was done right there.

While Go isn't perfect and you can certainly write some logic bugs that sufficiently clever use of a more strongly-typed language might let you avoid (though don't underestimate what sufficiently clever use of what Go already has can do for you either when wielded with skill), it has a number of characteristics that keep it somewhat safer than a lot of other languages.

First, it's memory safe in general, which obviously out of the gate helps a lot. You can argue about some super, super fringe cases with unprotected concurrent access to maps, but you're still definitely talking about something on the order of .1% to .01% of the surface area of C.

Next, many of the things that people complain about Go on Hacker News actually contribute to general safety in the code. One of the biggest ones is that it lacks any ability to take an string and simply convert it to a type, which has been the source of catastrophic vulnerabilities in Ruby [1] and Java (Log4Shell), among others. While I use this general technique quite frequently, you have to build your own mechanism for it (not a big deal, we're talking ~50 lines of code or so tops) and that mechanism won't be able to use any class (using general terminology, Go doesn't have "classes" but user-defined types fill in here) that wasn't explicitly registered, which sharply contains the blast radius of any exploit. Plus a lot of the exploits come from excessively clever encoding of the class names; generally when I simply name them and simply do a single lookup in a single map there isn't a lot of exploit wiggle room.

In general though it lacks a lot of the features that get people in trouble that aren't related to memory unsafety. Dynamic languages as a class start out behind the eight-ball on this front because all that dynamicness makes it difficult to tell exactly what some code might do with some input; goodness help you if there's a path to the local equivalent of "eval".

Go isn't entirely unique in this. Rust largely shares the same characteristics, there's some others that may qualify. But some other languages you might expect to don't; for instance, at least until recently Java had a serious problem with being able to get references to arbitrary classes via strings, leading to Log4Shell, even though Java is a static language. (I believe they've fixed that since then but a lot of code still has to have the flag to flip that feature back on because they depend on it in some fundamental libraries quite often.) Go turns out to be a relatively safe security language to write in compared to the landscape of general programming languages in common use. I add "in common use" and highlight it here because I don't think it's anywhere near optimal in the general landscape of languages that exist, nor the landscape of languages that ought to exist and don't yet. For instance in the latter case I'd expect capabilities to be built in to the lowest layer of a language, which would further do great, great damage to the ability to exploit such code. However no such language is in common use at this time. Pragmatically when I need to write something very secure today, Go is surprisingly high on my short list; theoretically I'm quite dissatisfied.

[1]: https://blog.trailofbits.com/2025/08/20/marshal-madness-a-br...

Imustaskforhelp · 2026-01-20T21:07:16 1768943236

I love golang a lot and I feel like in this context of QuickJS it would be interesting to see what a port of QuickJS with Golang might look like security wise & a comparison to rust in the amount of security as well.

Of course Golang and rust are apples to oranges comparison but still, if someone experienced in golang were to say port to QuickJS to golang and same for rust, aside from some performance cost which can arise from Golang's GC, what would be the security analysis of both?

Also Offtopic but I love how golang has a library for literally everything mostly but its language development ie runtime for interpreted langs/JIT's or transpilation efforts etc. do feel less than rust.

Like For python there's probably a library which can call rust code from Python, I wish if there was something like this for golang and I had found such a project (https://github.com/go-python/gopy) but it still just feels a little less targeted than rust within python which has libraries like polars and other more mature libraries

jerf · 2026-01-21T19:12:01 1769022721

If you want to see what a JS interpreter in Go would look like, you can look at https://pkg.go.dev/github.com/robertkrimen/otto and https://github.com/dop251/goja . Of course they aren't "ports", but I feel like having two fairly complete intepreters is probably enough to prove the point. Arguably even a "port" would require enough changes that it wouldn't really be a "port" anyhow.

(The quickjs package in the sibling comment is the original compiled into C. It will probably have all the security quirks of the original as a result.)

0xjnml · 2026-01-21T11:41:30 1768995690

See https://pkg.go.dev/modernc.org/quickjs

eru · 2026-01-20T06:36:33 1768890993

Yes, you can have docker container images that only contain the actual binary you want to run.

But if you are using a VM, you don't even need the Linux kernel: some systems let you compiler your program to run directly on the hypervisor.

See eg https://github.com/hermit-os/hermit-rs or https://mirage.io/

pizlonator · 2026-01-20T04:53:52 1768884832

Yeah Fil-C to the rescue

(I’m not trying to be facetious or troll or whatever. Stuff like this is what motivated me to do it.)

tptacek · 2026-01-20T00:36:13 1768869373

"C executables" are most of the frontier of exploit development, which is why this is a meaningful model problem.

0xDEAFBEAD · 2026-01-20T05:04:40 1768885480

Can we fight fire with fire, and use LLMs to rewrite all the C in Rust?

saagarjha · 2026-01-20T09:33:44 1768901624

Usually rewriting something in Rust requires nontrivial choices on the part of the translator that I’m not sure are currently within the reach of LLMs.

koakuma-chan · 2026-01-20T21:34:21 1768944861

I heard this before, that apparently there are things you cannot implement in Rust. Like, apparently you cannot implement certain data structures in Rust. I think this is bullshit. Rust supports raw pointers, etc. You can implement whatever you want in Rust.

staticassertion · 2026-01-20T21:43:21 1768945401

Presumably they are saying that you'd end up using a lot of `unsafe`. Of course, that's still much better than C, but I assume that their point isn't "You can't do it in Rust" it's "You can't translate directly to safe rust from C".

koakuma-chan · 2026-01-20T21:49:07 1768945747

> Of course, that's still much better than C

Exactly. "can't translate to safe Rust" is not a good faith argument.

koakuma-chan · 2026-01-20T21:51:16 1768945876

If anything, writing unsafe code in Rust is also fun. It has many primitives like `MaybeUninit` that make it fun.

johnisgood · 2026-01-20T21:44:19 1768945459

Yes, you are looking for https://rcoh.me/posts/rust-linked-list-basically-impossible/.

saagarjha · 2026-01-20T23:46:45 1768952805

That’s not what I said. I am saying that translating C code to Rust usually involves a human in the loop because it requires non-trivial decisions to produce a good result.

0xbadcafebee · 2026-01-20T05:57:30 1768888650

Sure, but the LLMs will just chain 14 functions instead of 7. If all C code is rewritten in Rust tomorrow that still leaves all the other bug classes. Eliminating a bug class might have made human attacks harder, but now with LLMs the "hardness" factor is purely how much token money you have.

adrianN · 2026-01-20T06:49:41 1768891781

Llms are not magic. Fixing a large class of exploits makes exploitation harder.

0xbadcafebee · 2026-01-20T23:38:12 1768952292

They kind of are magic, that's the point. You can just tell them to look at every other bug class, and keep them churning on it until they find something. You can fast-forward through years of exploit research in a week. The "difficulty" of different bug classes is almost gone. (I think people underestimate just how many exploits are out there in other classes because they've been hyperfocused on the low-hanging fruit)

lelanthran · 2026-01-20T10:38:00 1768905480

> Tells you all you need to know around how extremely weak a C executable like QuickJS is for LLMs to exploit. (If you as an infosec researcher prompt them correctly to find and exploit vulnerabilities).

Wouldn't GP's approach work with any other executable using libc? Python, Node, Rust, etc?

I fail to see what is specific to either C or QuickJS in the GP's approach.

vsgherzi · 2026-01-20T04:36:02 1768883762

Wouldn’t the idea be to not have the uaf to begin with? I’d argue it saves you very much by making the uaf way harder to write. Forcing unsafe and such.

cookiengineer · 2026-01-20T00:16:54 1768868214

> glibc's exit handler

> Yikes.

Yep.

arthurcolle · 2026-01-20T00:42:14 1768869734

Life, uh, finds a way

bryanrasmussen · 2026-01-20T07:54:20 1768895660

to self-destruct! heavy metal air guitar

jdefr89 · 2026-01-20T17:34:56 1768930496

Most modern kill chains involve chaining together that many bugs... I know because it's my job and its become demoralizing.

catoc · 2026-01-20T10:41:31 1768905691

So much for ‘stochastic parrots’

moron4hire · 2026-01-20T14:24:08 1768919048

> The exploits generated do not demonstrate novel, generic breaks in any of the protection mechanisms.

titzer · 2026-01-20T19:14:29 1768936469

> The sentences output by the model do not demonstrate words with novel characters.