I use it (with jj but should be the same with git). It tells Claude to commit after every Write tool use. It's a bit to small steps but I then usually just squash them afterwards. I haven't yet found a good automatic heuristic for when to tell Claude to commit (or directly auto-commit, but I like that Claude writes the commit message)
I don't understand what you mean by "outdated form factors". Are you saying that the laptop is an outdated form factor? What "market realities" are you noticing? Really interested in your viewpoint and would be grateful for some clarification.
Traditional laptops have their place, but I think most people would be better served by other form factors.
For instance a good amount of people use their laptops basically like a desktop and dock it to an external screen 90% of the time. For that specific use case, a tablet form factor will have better thermals, and lend itself better to have a separate and better keyboard and pointing device. The other 10% will still be a decent experience with either the detachable keyboard or straight bringing along an external keyboard if the work sequences are exepected to be long enough.
People more on the go but needing a powerful setup when needed now have access to devices that can expand the screen real estate beyond the 15" traditional limitation. Lenovo has been pushing the enveloppe on that front, and the build quality isn't bad either.
Gaming laptops are better served by Steam Deck/ROG Ally type of form factors etc.
The market is decently diversified and the form factors I'm describing are as far as I know selling better numbers than people clinging to Thinkpads and macbooks would expect.
> "Are you saying that the laptop is an outdated form factor?"
Yes, that's the gist of it. Classic laptops gave way to an acceptable interstage, the T-hinge convertible (with many great examples especially from IBM/Lenovo, HP, and Fujitsu), which was then superseded by the best of both worlds: the detachable. The latter chassis design, taken to its logical conclusion, is the best form factor for a modular, ultramobile to mobile general-purpose computing platform, i. e. it can technically be implemented as anything between a UMPC (i. e. a smartphone-sized and -styled slab) to something with a footprint of maximally 14 inches (example: HP's discontinued ZBook X2 G4 mobile workstation). Anything bigger I consider an antithesis to the form factor and therefore would not buy it, but that's obviously in the eye of any beholder.
One possible unrealistic "dream" design for me is, as weird as it sounds, a cross between a Nintendo Switch/Lenovo Legion Go (complete with detachable controller options!) and an improved Panasonic Toughbook G2, reworked as a professional-grade, maintenance-friendly mobile workstation (or a scaled-down, more maintenance-friendly and otherwise improved HP ZBook X2 G4 with ECC memory).
> "What 'market realities' are you noticing?"
Well, the above mentioned design is unrealistic as it would amount to an expensive general-purpose machine that needs a long-term support infrastructure. Not many companies on the market that are in a position to deliver on that promise for at least three continental zones (say, the Americas, the Eurozone and major parts of Asia). Or willing to do so.
Furthermore, the comment was a reflection on what is available on the market for the foreseeable future. I'm eyeing such a small mobile workstation for a) 2D graphics work and b) analysis of historical and archival data. I am even willing to put up with a classic laptop if I could get an ECC-equipped model. But none of these machines are mobile, they're all 16-inch+ brutes. No thanks.
So I have to look for other machines. ECC-machine? Fuck, most likely some mini-PC in addition to something mobile without ECC memory. Keeping that in mind, what are the options that come closest to the above ideal? Essentially only overspecialized, maintenance-averse gaming machines with pathetic battery life and a support quality somewhere between questionable and utterly inacceptable (Lenovo consumer division, OneXPlayer, Asus).
A Panasonic Toughbook G2 10-incher could be an acceptable alternative, but I'm not gonna fork over Panasonic-money for a non-ECC ruggedized machine without a DCI-P3 screen and a digitizer that's even worse than an Apple Pencil (I think they use either Microsoft's Pen Protocol or Wacom's AES tech).
Everything else is locked-down garbage with some sort of Fisher-Price OS, e. g. everything Apple, Samsung's Galaxy Tab Active5 Pro, etc.
If you're looking at tablets, you might be interested in devices supported by PostmarketOS[0]. You might not find one that you like and that's well-supported but it's worth a look IMO.
also just to add that I've noticed that `jj` comes way easier and more intuitive to newbies I've mentored. Just yesterday I told a friend to commit his changes and he just wanted to do `git commit` (without remembering to do `git add` first). This made me realize we should just install `jujutsu` for him and he's been committing very diligently afterwards. Can recommend trying this with any people you mentor/teach.
That's also my opinion, that jj should be easier for juniors to pick up. However, I felt like there's a lack of learning material targeted at people without prior VCS experience. That's why I wrote "Jujutsu for everyone": https://jj-for-everyone.github.io/.
The compatibility with git is the whole reason it's so popular (just run `jj git init --colocate` in your git repo).
You can use it without forcing your collaborators to switch from git and you can use it will a git forges as well.
I don't think you need `--colocate` any more, and maybe you don't even need `git`? I tried `jj init` in a git repo the other day and it did create a colocated jj repo, as far as I could see.
Would you elaborate a bit on how you use subagents? I tend to use them sporadically, for example for it to research something or to analyse the code base a bit. But I'm not yet letting it run for long.
Sure. First of all, although I do spend a lot of time interacting with Claude Code in chat format, that is not what I am talking about here. I have setup Claude Code with very specific instructions for use of agents, which I'll get to in a second.
First of all, there's a lot of collections of subagent definitions out there. I rolled my own, then later found others that worked better. I'm currently using this curated collection: https://github.com/VoltAgent/awesome-claude-code-subagents
CLAUDE.md has instructions to list `.agents/agents/**/*.md` to find the available agents, and knows to check the frontmatter yaml for a one-line description of what each does. These agents are really just (1) role definitions that prompts the LLM to bias its thinking in a particular way ("You are a senior Rust engineer with deep expertise in ..." -- this actually works really well), and (2) a bunch of rules and guidelines for that role, e.g. in the Rust case to use thiserror and strum crates to avoid boilerplate in Error enums, rules for how to satisfy the linter, etc. Basic project guidelines as they relate to Rust dev.
Secondly, my CLAUDE.md for the project has very specific instructions about how the top-level agent should operate, with callouts to specific procedure files to follow. These live in `.agent/action/**/*.md`. For example, I have a git-commit.md protocol definition file, and instructions in CLAUDE.md that "when the user prompts with 'commit' or 'git commit', load git-commit action and follow the directions contained within precisely." Within git-commit.md, there is a clear workflow specification in text or pseudocode. The [text] is my in-line comments to you and not in the original file:
"""
You are tasked with committing the currently staged changes to the currently active branch of this git repository. You are not authorized to make any changes beyond what has already been staged for commit. You are to follow these procedures exactly.
1. Check that the output of `git diff --staged` is not empty. If it is empty, report to the user that there are no currently staged changes and await further instructions from the user.
2. Stash any unstaged changes, so that the worktree only contains the changes that are to be committed.
3. Run `./check.sh` [a bash script that runs the full CI test suite locally] and verify that no warnings or errors are generated with just the currently staged changes applied.
- If the check script doesn't pass, summarize the errors and ask the user if they wish to launch the rust-engineer agent to fix these issues. Then follow the directions given by the user.
4. Run `git diff --staged | cat` and summarize the changes in a git commit message written in the style of the Linux kernel mailing list [I find this to be much better than Claude's default commit message summaries].
5. Display the output of `git diff --staged --stat` and your suggested git commit message to the user and await feedback. For each response by the user, address any concerns brought up and then generate a new commit message, as needed or instructed, and explicitly ask again for further feedback or confirmation to continue.
6. Only when the user has explicitly given permission to proceed with the commit, without any accompanying actionable feedback, should you proceed to making the commit. Execute 'git commit` with the exact text for the commit message that the user approved.
7. Unstash the non-staged changes that were previously stashed in step 2.
8. Report completion to the user.
You are not authorized to deviate from these instructions in any way.
"""
This one doesn't employ subagents very much, and it is implicitly interactive, but it is smaller and easier to explain. It is, essentially, a call center script for the main agent to follow. In my experience, it does a very good job of following these instructions. This particular one addresses a pet peeve of mine: I hate the auto-commit anti-feature of basically all coding assistants. I'm old-school and want a nice, cleanly curated git history with comprehensible commits that take some refining to get right. It's not just OCD -- my workflow involves being able to git bisect effectively to find bugs, which requires a good git history.
I also have a task.md workflow that I'm actively iterating on, and is the one that I get it working autonomously for a half hour to an hour and am often surprised at finding very good results (but sometimes very terrible results) at the end of it. I'm not going to release this one because, frankly, I'm starting to realize there might be a product around this and I may move on that (although this is already a crowded space). But I don't mind outlining in broad strokes how it works (hand-summarized, very briefly):
"""
You are a senior software engineer in a leadership role, directing junior engineers and research specialists (your subagents) to perform the task specified by the user.
1. If PLAN.md exists, read its contents and skip to step 4.
2. Without making any tool calls, consider the task as given and extrapolate the underlying intent of the user.
[A bunch of rules and conditions related to this first part -- clarify the intent of the user without polluting the context window too much]
3. Call the software-architect agent with the reformulated user prompt, and with clear instructions to investigate how the request would be implemented on the current code base. The agent is to fill its context window with the portions of the codebase and developer documentation in this repo relevant to its task. It should then generate and report a plan of action.
[Elided steps involving iterating on that plan of action with the user, and various subagents to call out to in order to make sure the plan is appropriately sequenced in terms of dependent parts, chunked into small development steps, etc. The plan of action is saved in PLAN.md in the root of the repository.]
4. While there are unfinished todos in the PLAN.md document, repeat the following steps:
a) Call rust-engineer to implement the next todo and/or verify completion of the todo.
b) Call each of the following agents with instructions to focus on the current changes in the workspace. If any actionable items are found in the generated report that are within the scope of the requested task, call rust-engineer to address these items and then repeat:
- rust-nit-checker [checks for things I find Claude gets consistently wrong in Rust code]
- test-completeness-checker [checks for missing edge cases or functionality not tested]
- code-smell-checker [a variant of the software architect agent that reports when things are generally sus]
- [... a handful of other custom agents; I'm constantly adjusting this list]
- dirty-file-checker [reports any test files or other files accidentally left and visible to git]
c) Repeat from step a until you run through the entire list of agents without any actionable, in-scope issues identified in any of the reports & rust-engineer still reports the task as fully implemented.
d) Run git-commit-auto agent [A variation of the earlier git commit script that is non-interactive.]
e) Mark the current todo as done in PLAN.md
5. If there are any unfinished todo in PLAN.md, return to step 4. Otherwise call software-architect agent with the original task description as approved by the user, and request it to assess whether the task is complete, and if not to generate a new PLAN.md document.
6. If a new PLAN.md document is generated, return to step 4. Otherwise, report completion to the user.
"""
That's my current task workflow, albeit with a number of items and agent definitions elided. I have lots of ideas for expanding it further, but I'm basically taking an iterative and incremental approach: every time Claude fumbles the ball in an embarrassing way (which does happen!), I add or tweak a rule to avoid that outcome. There are a couple of key points:
1) Using Rust is a superpower. With guidance to the agent about what crates to use, and with very strict linting tools and code checking subagents (e.g. no unsafe code blocks, no #[allow(...)] directives to override the linter, an entire subagent dedicated to finding and calling out string-based typing and error handling, etc.) this process produces good code that largely works and does what it was requested to do. You don't have to load the whole project in context to avoid pointer or use-after-free issues, and other things that cause vibe coded project to fail at a certain complexity. I don't see this working in a dynamic language, for example, even though LLMs are honestly not as good at Rust as they are in more prominent languages.
2) The key part of the task workflow is the long list of analysts to run against the changes, and the assumption that works well in practice that you can just keep iterating and fixing reported issues (with some of the elided secret sauce having to do with subagents to evaluate whether an issue is in scope and needs to be fixed or can be safely ignored, and keeping on eye out for deviations from the requested task). This eventual completeness assumption does work pretty well.
3) At some point the main agent's context window gets poisoned, or it reaches the full context window and compacts. Either way this kills any chance of simply continuing. In the first case (poisoning) it loses track of the task and ends up caught in some yak shaving rabbit hole. Usually it's obvious when you check in that this is going on, and I just nuke it and start over. In the latter case (full context window) the auto-compaction also pretty thoroughly destroys workflow but it usually results in the agent asking a variation on "I see you are in the middle of ... What do you want to do next?" before taking any bad action to the repo itself. Clearing the now poisoned context window with "/reset" and then providing just "task: continue" gets it back on track. I have a todo item to automate this, but the Claude Code API doesn't make it easy.
4) You have to be very explicit about what can and cannot be done by the main agent. It is trained and fine-tuned to be an interactive, helpful assistant. You are using it to delegate autonomous tasks. That requires explicit and repeated instructions. This is made somewhat easier by the fact that subagents are not given access to the user -- they simply run and generate reports for the calling agent. So I try to pack as much as I can in the subagents and make the main agent's role very well defined and clear. It does mean that you have to manage out of band communication between agents (e.g. the PLAN.md document) to conserve context tokens.
If you try this out, please let me know how it goes :)
I tried this tonight as my first time using anything like Claude code, and having a week or so of copilot agentic mode experience.
It's the right path, I'm very smitten with seeing the sub agents working together. Blew through the Pro quota really fast.
I was a skeptic and am no more. Gonna see what it takes to run something basic in a home lab, and how the performance is, even if it is incredibly slow on a beefy home system, just checking in on it should be low enough friction for it to noodle on some hobby projects.
Yeah it was a "HOLY SHIT" moment for me when I first started experimenting with subagents. A step-change improvement in productivity for sure. They combine well together with Claude Code's built-in todo tool, and together really start to deliver on the promised goal of automating development. Watching it delegate to subagents and then seeing the flow of information back and forth is amazing.
One thing I forgot to mention -- I run Claude within a simple sandboxed dev container like this: https://github.com/maaku/agents/tree/main/.devcontainer This allows to safely run with '--dangerously-skip-permissions' which basically gives Claude free reign within the docker container in which it is running. This is what lets you run without user interaction.
When you say "run something basic in a home lab" do you mean local inference? Qwen3-Coder is probably the model to use if you want to go that route. Avoid gpt-oss as they used synthetic data in their training and it is unlikely to perform well.
I'm investigating this as well as I need local inference for some sensitive data. But honestly, the anthropic models work so well that I justified getting myself the unlimited/max plan and I mostly use that. I suspect I overbought -- at $200/mo I have yet to ever be rate limited, even with these long-running instances. I stay within the ToC and only run 1-2 sessions at a time though.
I just recently stumbled upon your tdd-guard when looking for inspiration for Claude hooks. I've been so impressed with what it allowed me to improve the workflow and quality. Then I was somewhat disappointed that almost no one seems to talk about this potential and how they're using hooks. Yours was the only interesting project I found in this regard and hope to give it a spin this weekend .
You don't happen to have a short video where you go into a bit more detail on how you use it though?
I don't have a detailed video beyond the short demo on the repo, but I'll look into recording something more comprehensive or cover it in a blog post. Happy to ping you when it's ready!
In the meantime: I simply set it up and go about my work. The only thing I really do is just nudge the agent into making architectural simplifications and make sure that it follows the testing strategies that I like: dependency injection, test helpers, test data factories and such. Things that I would do regardless of the hook.
I like to give my tests the same attention and care that I give production code. They should be meaningful and resilient. The code base contains plenty of examples but I will look into putting something together.
I spent my summer holiday on this because I truly believe in the potential of hooks in agentic coding. I'm equally surprised that this space hasn't been explored more.
I'm currently working on making the validation faster and more customizable, plus adding reporters to support more languages.
I think there is an Amazon backed vscode forked that is also exploring this space. I think they market it as spec driven development.
Habe you tried opencode? I haven't really, but it can use your anthropic subscription and also switch to most other models. It also looks quite nice IMO
reply