More

jmvldz · 2025-08-21T22:00:22 1755813622

I don’t have much in the way of tests right now but I am building with Typescript and Rust so that catches many basic bugs.

I don’t find the issue to be breaking other parts of the app, more-so that new features don’t work as advertised by Claude.

One of my takeaways here is that I should give Claude an integration test harness and tell it that it must finish running that successfully before committing any code.

jmvldz · 2025-08-21T20:57:03 1755809823

Haha fair enough. Fixed!

pipes · 2025-08-22T06:25:24 1755843924

Ha! Thanks :)

jmvldz · 2025-08-21T20:53:15 1755809595

I'm trying to prototype extremely quickly and I'm working on my project alone so yes, often I accept PRs without looking too closely at the code if my local testing succeeds.

I'm using Typescript and Rust and I think it's critical to use strict typing with LLMs to catch simple bugs.

I've worked at Uber as an infra engineer and at Gem as an engineering manager so I do consider myself an "actual professional developer". The critical bit is the context of the project I'm working on. If I were at a tech company building software, I'd be much more reticent to ship AI generated PRs whole cloth.

DrewADesign · 2025-08-21T22:07:42 1755814062

Well, prototyping is indeed a whole different ball of wax.

jmvldz · 2025-08-21T20:45:32 1755809132

I don't have a ton of tests. From what I've seen, Claude will often just update the tests to no-op so tests passing isn't trustworthy.

My workflow is often to plan with ChatGPT and what I was getting at here is ChatGPT can often hallucinate features of 3rd party libraries. I usually dump the plan from ChatGPT straight into Claude Code and only look at the details when I'm testing.

That said, I've become more careful in auditing the plans so I don't run in to issues like this.

CuriouslyC · 2025-08-21T21:18:27 1755811107

Tell Claude to use a code review sub agent after every significant change set, tell them to run the tests and evaluate the change set, don't tell Claude it wrote the code, and give them strict review instructions. Works like a charm.

jmvldz · 2025-08-21T21:57:19 1755813439

Interesting. I had not thought about a code review sub agent. I will give that a shot.

cpursley · 2025-08-21T22:44:55 1755816295

Any tips on writing productive review sub agent instruction?

CuriouslyC · 2025-08-21T22:54:23 1755816863

Yes. Go on ChatGPT, explain what you're doing (claude code, trying to get it to be more rigorous with itself and reduce defects) then click deep research and tell it you'd like it to look up code review best practices, AI code review, smells/patterns to look out for in AI code, etc. Then have it take the result of that and generate a XML structured document with a flowchart of the code review best practices it discovered, cribbing from an established schema for element names/attributes when possible, and put it in fenced xml blocks in your subagent. You can also tell claude code to do deep research, you just have to be a little specific about what it should go after.

jmvldz · 2025-08-21T20:42:58 1755808978

Ooh, that's a good title for another post! And yes, I agree with you.

Initially I would barely read any of the code generated and as my project has grown in size, I have approached the limits of that approach.

Often because Claude Code makes very poor architectural choices.

nchmy · 2025-08-21T21:01:48 1755810108

Welcome to vibe/agentic engineering

jmvldz · 2025-08-21T20:41:27 1755808887

100% yes. QA'ing a bunch of LLM generated code feels like a mental flood. Losing that mental rest is a great way to put it.

CuriouslyC · 2025-08-21T21:16:27 1755810987

MCP up Playwright, have a detailed spec, and tell claude to generate a detailed test plan for every story in the spec, then keep iterating on a test -> fix -> ... loop until every single component has been fully tested. If you get claude to write all the components (usually by subfolder) out to todos, there's a good chance it'll go >1 hour before it tries to stop, and if you have an anti-stopping hook it can go quite a bit longer.

samrus · 2025-08-21T22:37:28 1755815848

Youve got to be doing the most unoriginal work on the planet if this doesnt produce a bowl of disfunctional spaghetti

CuriouslyC · 2025-08-21T22:56:29 1755816989

Every sentence you will ever write in your entire life will be made from a finite set of letters. The magic is in how you arrange them.

If you have a really detailed, well thought out spec, you do TDD and you have regular code review and refactor loops, agentic coding stays manageable.

cruffle_duffle · 2025-08-21T23:48:47 1755820127

It takes an incredibly detailed spec to get an LLM to not go completely off the rails and even then. The amount of time writing that spec can take more time than just doing it by hand.

There is way too much babysitting with these things.

I’m sure somehow somebody makes it work but I’m incredibly skeptical that you can let an LLM run unsupervised and only review its output as a PR.

andrekandre · 2025-08-23T03:11:41 1755918701

  > The amount of time writing that spec can take more time than just doing it by hand.

one thing about doing it by hand is you also notice holes/deficiencies in the spec and can go back and update it, make the product better, but just throwing it to an llm 'til its perfect-to-spec probably means its just going to be average quality at best...

tho tbh most software isn't really 'stunning' imo so maybe thats fine as far as most businesses are concerned... (sad face)

cpursley · 2025-08-21T22:42:26 1755816146

Can you elaborate on what you mean by anti stopping hook? Sometime I take breaks, go on walks, etc and it would be cool of Claude tried different things and even branches etc that I could review when back.

CuriouslyC · 2025-08-21T23:20:07 1755818407

Basically, all LLMs are "lazy" to some degree and are looking for ways to terminate responses early to conform to their training distribution. As a result, sometimes an agent will want to stop and phone home even if you have multiple rows of all caps saying DO NOT STOP UNTIL YOUR ENTIRE TODO LIST IS COMPLETE (seriously). Claude code has a hook for when the main agent and subagents try to stop, and you can reject their stop attempt with a message. They can still override that message and stop but the change of turn and the fresh "DO NOT STOP ..." that's at the front of context seem to keep it revving for a long time.

rstuart4133 · 2025-08-21T22:51:58 1755816718

Another way of saying thing is only an AI reviewer could cope with the flood of code an AI can produce.

But AI reviewers can do little beyond checking coding standards.

jmvldz · 2025-08-21T20:39:40 1755808780

"It isn't like programming. It is its own thing."

You articulated what I was wrestling with in the post perfectly.

jmvldz · 2025-08-20T22:32:38 1755729158

Ah interesting. I wonder if it's similar with Devin users.

jmvldz · 2025-05-08T19:06:11 1746731171

I agree the branching sounds super cool!

jmvldz · 2025-05-08T18:18:48 1746728328

Coding agents are the future and it's anyone's game right now.

The main reason I think there is such a proliferation is it's not clear what the best interface to coding agents will be. Is it in Slack and Linear? Is it on the CLI? Is it a web interface with a code editor? Is it VS Code or Zed?

Just like everyone has their favored IDE, in a few years time, I think everyone will have their favored interaction pattern for coding agents.

Product managers might like Devin because they don't need to setup an environment. Software engineers might still prefer Cursor because they want to edit the code and run tests on their own.

Cursor has a concept of a shadow workspace and I think we're going to see this across all coding agents. You kick off an async task in whatever IDE you use and it presents the results of the agent in an easy to review way a bit later.

As for Void, I think being open source is valuable on it's own. My understanding is Microsoft could enforce license restrictions at some point down the road to make Cursor difficult to use with certain extensions.

Another YC backed open source VS Code is Continue: https://www.continue.dev/

(Caveat: I am a YC founder building in this space: https://www.engines.dev/)

handfuloflight · 2025-05-08T21:28:38 1746739718

When can we expect a release from Engines?