>Context transfer between [sub]agents is also poor
That's the main point of sub-agents, as far as I can tell. They get their own context, so it's much cheaper. You divide tasks into chunks, let a sub-agent handle each chunk. That actually ties in nicely with the emphasis on careful context management, earlier in the article.
>Before you reach for a frontier model, ask yourself: does this actually need a trillion-parameter model?
>Most tasks don't. This repo helps you figure out which ones.
About a year ago I was testing Gemini 2.5 Pro and Gemini 2.5 Flash for agentic coding. I found they could both do the same task, but Gemini Pro was way slower and more expensive.
This blew my mind because I'd previously been obsessed with "best/smartest model", and suddenly realized what I actually wanted was "fastest/dumbest/cheapest model that can handle my task!"
I haven't tested it extensively but I found that when I used Claude Code with it, it was reasonably fast (but actual Claude was way faster), but when I tried to use the API itself manually, it would be super slow.
My guess would be think they're filtering the traffic and prioritizing certain types. On my own script, I ran into a rate limit after 7 requests!
Claude Code spends most of its time poking around the files. It doesn't have any knowledge of the project by default (no file index etc), unless they changed it recently.
When I was using it a lot, I created a startup hook that just dumped a file listing into the context, or the actual full code on very small repos.
I also got some gains from using a custom edit tool I made which can edit multiple chunks in multiple files simultaneously. It was about 3x faster. I had some edge cases where it broke though, so I ended up disabling it.
What's the difference between resetting a container or resetting a VPS?
On local machine I have it under its own user, so I can access its files but it cannot access mine. But I'm not a security expert, so I'd love to hear if that's actually solid.
On my $3 VPS, it has root, because that's the whole point (it's my sysadmin). If it blows it up, I wanna say "I'm down $3", but it doesn't even seem to be that since I can just restore it from an backup.
This is basically identical to the ChatGPT/GPT-3 situation ;) You know OpenAI themselves keep saying "we still don't understand why ChatGPT is so popular... GPT was already available via API for years!"
ChatGPT is quite different from GPT. Using GPT directly to have a nice dialogue simply doesn't work for most intents and purposes. Making it usable for a broad audience took quite some effort, including RLHF, which was not a trivial extension.
I have a separate removable SSD I can boot from to work with Claude in a dedicated environment. It is nice being able to offload environment set up and what not to the agent. That environment has wifi credentials for an isolated LAN. I am much more permissive of Claude on that system. I even automatically allow it WebSearch, but not WebFetch (much larger injection surface). It still cannot do anything requiring sudo.
They are not. Many people are doing this; I don't think there's enough data to say "most," but there's at least anecdotal discussions of people buying Mac minis for the purpose. I know someone who's running it on a spare Mac mini (but it has Internet access and some credentials, so...).
That's the main point of sub-agents, as far as I can tell. They get their own context, so it's much cheaper. You divide tasks into chunks, let a sub-agent handle each chunk. That actually ties in nicely with the emphasis on careful context management, earlier in the article.
reply