More

beacon294 · 2026-02-01T11:49:31 1769946571

My codex just uses python to write files around the sandbox when I ask it to patch a sdk outside its path.

Sharlin · 2026-02-01T11:58:11 1769947091

It's definitely not a sandbox if you can just "use python to write files" outside of it o_O

chongli · 2026-02-01T13:41:35 1769953295

Hence the article’s security theatre remark.

I’m not sure why everyone seems to have forgotten about Unix permissions, proper sandboxing, jails, VMs etc when building agents.

Even just running the agent as a different user with minimal permissions and jailed into its home directory would be simple and easy enough.

embedding-shape · 2026-02-01T14:05:44 1769954744

I'm just guessing, but seems the people who write these agent CLIs haven't found a good heuristic for allowing/disallowing/asking the user about permissions for commands, so instead of trying to sit down and actually figure it out, someone had the bright idea to let the LLM also manage that allowing/disallowing themselves. How that ever made sense, will probably forever be lost on me.

`chroot` is literally the first thing I used when I first installed a local agent, by intuition (later moved on to a container-wrapper), and now I'm reading about people who are giving these agents direct access to reply to their emails and more.

Majromax · 2026-02-01T16:44:55 1769964295

> I'm just guessing, but seems the people who write these agent CLIs haven't found a good heuristic for allowing/disallowing/asking the user about permissions for commands, so instead of trying to sit down and actually figure it out, someone had the bright idea to let the LLM also manage that allowing/disallowing themselves. How that ever made sense, will probably forever be lost on me.

I don't think there is such a good heuristic. The user wants the agent to do the right thing and not to do the wrong thing, but the capabilities needed are identical.

> `chroot` is literally the first thing I used when I first installed a local agent, by intuition (later moved on to a container-wrapper), and now I'm reading about people who are giving these agents direct access to reply to their emails and more.

That's a good, safe, and sane default for project-focused agent use, but it seems like those playing it risky are using agents for general-purpose assistance and automation. The access required to do so chafes against strict sandboxing.

valleyer · 2026-02-01T14:13:17 1769955197

Here's OpenAI's docs page on how they sandbox Codex: https://developers.openai.com/codex/security/

Here's the macOS kernel-enforced sandbox profile that gets applied to processes spawned by the LLM: https://github.com/openai/codex/blob/main/codex-rs/core/src/...

I think skepticism is healthy here, but there's no need to just guess.

chongli · 2026-02-01T14:56:40 1769957800

That still doesn't seem ideal. Run the LLM itself in a kernel-enforced sandbox, lest it find ways to exploit vulnerabilities in its own code.

valleyer · 2026-02-01T15:09:26 1769958566

The LLM inference itself doesn't "run code" per se (it's just doing tensor math), and besides, it runs on OpenAI's servers, not your machine.

chongli · 2026-02-01T15:50:58 1769961058

There still needs to be a harness running on your local machine to spawn the processes in their sandboxes. I consider that "part of the LLM" even if it isn't doing any inference.

valleyer · 2026-02-01T16:14:20 1769962460

If that part were running sandboxed, then it would be impossible for it to contact the OpenAI servers (to get the LLM's responses), or to spawn an unsandboxed process (for situations where the LLM requests it from the user).

chongli · 2026-02-01T16:24:05 1769963045

That's obviously not true. You can do anything you want with a sandbox. Open a socket to the OpenAI servers and then pass that off to the sandbox and let the sandboxed process communicate over that socket. Now it can talk to OpenAI's servers but it can't open connections to any other servers or do anything else.

The startup process which sets up the original socket would have to be privileged, of course, but only for the purpose of setting up the initial connection. The running LLM harness process would not have any ability to break out of the sandbox after that.

As for spawning unsandboxed processes, that would require a much more sophisticated system whereby the harness uses an API to request permission from the user to spawn the process. We already have APIs like this for requesting extra permissions from users on Android and iOS, so it's not in-principle impossible either.

In practice I think such requests would be a security nightmare and best avoided, since essentially it would be like a prisoner asking the guard to let him out of jail and the guard just handing the prisoner the keys. That unsandboxed process could do literally anything it has permissions to do as a non-sandboxed user.

valleyer · 2026-02-01T17:24:47 1769966687

You are essentially describing the system that Codex (and, I presume, Claude Code et al.) already implements.

chongli · 2026-02-01T19:08:58 1769972938

The devil is in the details. How much of the code running on my machine is confined to the sandbox vs how much is used in the boostrap phase? I haven't looked but I would hope it can survive some security audits.

ZeroGravitas · 2026-02-01T15:27:41 1769959661

If I'm following this it means you need to audit all code that the llm writes though as anything you run from another terminal window will be run as you with full permissions.

MillionOClock · 2026-02-01T18:12:36 1769969556

The thing is that on macOS at least, Codex does have the ability use an actual sandbox that I believe prevents certain write operations and network access.

valleyer · 2026-02-01T13:49:03 1769953743

Is it asking you permission to run that python command? If so, then that's expected: commands that you approve get to run without the sandbox.

The point is that Codex can (by default) run commands on its own, without approval (e.g., running `make` on the project it's working on), but they're subject to the imposed OS sandbox.

This is controlled by the `--sandbox` and `--ask-for-approval` arguments to `codex`.

beacon294 · 2026-01-07T21:41:49 1767822109

These are called flat because they are defined in flat files.

Apparently researchers call non-hierarchical state machines "flat machines" / "flat agents". Oh well!

I think editing post content locks after some time or edit count.

Well-formatted examples:

* https://github.com/memgrafter/flatagents/tree/main/sdk/pytho...

* https://github.com/memgrafter/research-crawler-flatagents

* https://github.com/memgrafter/claude-skills-flatagents

beacon294 · 2025-12-26T20:34:36 1766781276

Probably for any case where an actual human is doing it. On an image you obviously want to do it at bake time, so I feel default off with a flag would have been a better design decision for pip.

I just read the thread and use Python, I can't comment on the % speedup attributed to uv that comes from this optimization.

Epa095 · 2025-12-26T20:54:02 1766782442

Images are a good example where doing it at install-time is probably the best yeah, since every run of the image starts 'fresh', losing the compilation which happened last time the image got started.

If it was a optional toggle it would probably become best practice to activate compilation in dockerfiles.

zahlman · 2025-12-27T06:29:21 1766816961

> On an image you obviously want to do it at bake time

It seems like tons of people are creating container images with an installer tool and having it do a bunch of installations, rather than creating the image with the relevant Python packages already in place. Hard to understand why.

For that matter, a pre-baked Python install could do much more interesting things to improve import times than just leaving a forest of `.pyc` files in `__pycache__` folders all over the place.

beacon294 · 2025-12-26T14:17:57 1766758677

Can you use search? Anything else missing? I use cerebras glm 4.6 thinking on aider and looking to switch some usages to claude code or opencode.

beacon294 · 2025-12-14T18:23:22 1765736602

It's confusing but Kimi K2 Thinking is not the same.

beacon294 · 2025-12-05T13:17:14 1764940634

> By pathologizing them, we(society) loose touch for what they mean in our life. It also makes discourse hard because the (this is causing me to truly not be able to function) gets mixed in with the (this is a way that my brain behaves, but I can mostly live a life).

As I recently learned, ADHD executive processing issues, rsd, and demand avoidance absolutely are a pathology and if you don't even know you have them it is like being hit by a truck when the requirements of your workplace (and your life) change under your feet.

There are situations in which I will use my accommodations in the future, but it has not been an everyday need for me.

Think of dyslexia. My dear friend is an all star aerospace engineer but he couldn't read his tests in college, so he used the extended test proctoring. In the workplace he needs to receive a report, then read it and meet after he has spent appropriate time on it. This is an accommodation. It is required.

beacon294 · 2025-12-05T13:08:18 1764940098

If text game interfaces (with llms) are also in there, you can add mine:

https://loreblendr.ai/app

beacon294 · 2025-12-05T13:05:36 1764939936

I built a free + freemium character card app for iOS: https://loreblendr.ai/app These cards are super versatile prompts mediums and haven't been fully creatively explored.

beacon294 · 2025-11-10T16:35:17 1762792517

I built a free (eventually free-mium) character card app for iOS: https://loreblendr.ai/app

These cards are super versatile prompts mediums and haven't been fully creatively explored.

beacon294 · 2025-11-07T03:50:39 1762487439

I built a free (eventually free-mium) character card app for iOS: https://loreblendr.ai/app