Hacker Newsnew | past | comments | ask | show | jobs | submit | remify's commentslogin

Sub agent also helps a lot in that regard. Have an agent do the planning, have an implementation agent do the code and have another one do the review. Clear responsabilities helps a lot.

There also blue team / red team that works.

The idea is always the same: help LLM to reason properly with less and more clear instructions.


This sounds very promising. Any link to more details?


A huge part of getting autonomy as a human is demonstrating that you can be trusted to police your own decisions up to a point that other people can reason about. Some people get more autonomy than others because they can be trusted with more things.

All of these models are kinda toys as long as you have to manually send a minder in to deal with their bullshit. If we can do it via agents, then the vendors can bake it in, and they haven't. Which is just another judgement call about how much autonomy you give to someone who clearly isn't policing their own decisions and thus is untrustworthy.

If we're at the start of the Trough of Disillusionment now, which maybe we are and maybe we aren't, that'll be part of the rebound that typically follows the trough. But the Trough is also typically the end of the mountains of VC cash, so the costs per use goes up which can trigger aftershocks.


This approach sounds clean in theory, but in production you're building a black box. When your planning agent hands off to an implementation agent and that hands off to a review agent — where did the bug originate? Which agent's context was polluted? Good luck tracing that. I went the opposite direction: single agent per task, strict quality gates between steps, full execution logs. No sub-agents. Every decision is traceable to one context window. The governance layer (PR gates, staged rollouts, acceptance criteria) does the work that people expect sub-agents to do — but with actual observability.

After 6 months in production and 1100+ learned patterns: fewer moving parts, better debugging, more reliable output. Built a full production crawler this way — 26 extractors, 405 tests — without sub-agents. Orchestrator acts as gatekeeper that redispatches uncompleted work.


> Every decision is traceable to one context window

There are no models that can do all the mentioned steps in a single usable context window. This is why subagents or multi-agent orchestrators exist in the first place.


You're right that no model handles everything in one context window — that's exactly why I built context rotation. Each task runs in a single agent context (one responsibility, clear scope), and when the window fills up, the system automatically rotates: writes a structured handover, clears, and resumes in a fresh window.

The key distinction: sub-agents run within a parent context with shared state (black box). My approach uses independent parallel agents (separate terminals, separate context windows) that report back to an orchestrator. Large tasks get split into smaller dispatches upfront — each scoped to fit a single context window. The orchestrator can dispatch research to 3 agents in parallel, collect their outputs, then dispatch a synthesis task to a single agent that merges the findings.

So it's not "one context window for everything" — it's right-sized tasks with full observability per agent, and a governance layer managing the sequence and merging results.


That sounds interesting. I do hate how there's no observability into subagents and you just get a summary.

How do they report back to the orchestrator? Tmux?


Yes, tmux. The setup is a 2x2 grid:

T0 (orchestrator) | T1 (Track A) T2 (Track B) | T3 (Track C)

When a worker finishes, it writes a structured report to a shared unified_reports/ directory. A file watcher (receipt processor) detects it, parses the report into a structured NDJSON receipt (status, files changed, open items, git ref), and delivers it to T0's pane.

T0 then reviews the receipt, runs a quality advisory (automated pass/warn/hold verdict), and decides: close open items, complete the PR, or redispatch. Everything is filesystem-based — no API, no database, no shared memory between agents. Each terminal has its own context window, its own Claude Code (or Codex/Gemini) session, and the only communication channel is structured files on disk.

The receipt ledger is append-only NDJSON, so you can always trace: which agent did what, when, on which dispatch, with which git commit.

I open-sourced the setup recently if you want to dig into the details.


Since the phases are sequential, what’s the benefit of a sub agent vs just sequential prompts to the same agent? Just orchestration?


Context pollution, I think. Just because something is sequential in a context file doesn’t mean it’ll happen sequentially, but if you use subagents there is a separation of concerns. I also feel like one bloated context window feels a little sloppy in the execution (and costs more in tokens).

YMMV, I’m still figuring this stuff out


This runs counter to the advice in the fine article: one long continuous session building context.


I think claude-code is doing this at the background now


It's funnily enough quite the opposite front ends that have a focus on UX are pretty well protected from generative AI


I haven't heard this perspective. I'm kind of surprised the LLMs can't generate coherent frontend framework-ized code, if that's the implication.


Both of you are right. They can generate the code quite well, but well-considered UX is another thing entirely.


That's fair, I was considering just after I posted that I was framing this in a black-and-white manner. It leaves the reader to decide what it means for it to "work" or not. That might be a useful thing for people (including me) to consider when talking about this stuff. Where's the bar? Is the benefit worth the cost?


Strangly, yeah. LLMs are absolute trash at generating good UX and UI.


Agreed. That’s the one area where I think my experience will still have value (for a while anyway): translating customer requests into workable UI/UX, before handing off to the LLM.


Thanks for the input I was planning to do something like that with a CLI though


That's the thing that bothers me here. They loaded the doc of course it will work but as your project grows you won't be able to put all your documentation in there (at least with current context handling).

Skills are still very much relevant on big and diverse projects.


At work we are using this feature. A lot of time we need to do some kind of pdf reporting. We built them as html pages and print them as pdf.

Works fine.


ORMs are mostly useless they make easy queries easier et hard query a lot harder.


It is and you'll not doing much against it


Poker is "solved" for low level and high volume which is what a bot would be good at.


Chatcontrol isn't there yet.


No of course but it will be.

There's another vote on the 17th of October and most countries are in favour now :( And if it fails again I'm sure they will keep trying like they have been until they can finally push it through.

Notably in this iterations the politicans are making an exemption for themselves and their servants (including police etc).

But I think Google thinks the time is right now because it will be a prerequisite for this.


I've got an AMD Ryzen 9 365 processor on my new laptop and I really like it. Huge autonomy and good performance when needed, it's comparable to the M3 version (not the Max).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: