Hacker Newsnew | past | comments | ask | show | jobs | submit | EastLondonCoder's commentslogin

I don’t use plan.md docs either, but I recognise the underlying idea: you need a way to keep agent output constrained by reality.

My workflow is more like scaffold -> thin vertical slices -> machine-checkable semantics -> repeat.

Concrete example: I built and shipped a live ticketing system for my club (Kolibri Tickets). It’s not a toy: real payments (Stripe), email delivery, ticket verification at the door, frontend + backend, migrations, idempotency edges, etc. It’s running and taking money.

The reason this works with AI isn’t that the model “codes fast”. It’s that the workflow moves the bottleneck from “typing” to “verification”, and then engineers the verification loop:

  -keep the spine runnable early (end-to-end scaffold)

  -add one thin slice at a time (don’t let it touch 15 files speculatively)

  -force checkable artifacts (tests/fixtures/types/state-machine semantics where it matters)

  -treat refactors as normal, because the harness makes them safe
If you run it open-loop (prompt -> giant diff -> read/debug), you get the “illusion of velocity” people complain about. If you run it closed-loop (scaffold + constraints + verifiers), you can actually ship faster because you’re not paying the integration cost repeatedly.

Plan docs are one way to create shared state and prevent drift. A runnable scaffold + verification harness is another.


Now that code is cheap, I ensured my side project has unit/integration tests (will enforce 100% coverage), Playwright tests, static typing (its in Python), scripts for all tasks. Will learn mutation testing too (yes, its overkill). Now my agent works upto 1 hour in loops and emits concise code I dont have to edit much.

Totally get it, and I think we’re describing the same control loop from different angles.

Where I differ slightly is: “100% coverage” can turn into productivity theatre. It’s a metric that’s easy to optimize while missing the thing you actually care about: do we have machine-checkable invariants at the points where drift is expensive?

The harness that’s paid off for me (on a live payments system) is:

  - thin vertical slice first (end-to-end runnable, even if ugly)

  - tests at the seams (payments, emails, ticket verification / idempotency)

  - state-machine semantics where concurrency/ordering matters

  - unit tests as supporting beams, not wallpaper
Then refactors become routine, because the tests will make breakage explicit.

So yes: “code is cheap” -> increase verification. Just careful not to replace engineering judgement with an easily gamed proxy.


I think the deluge on projects on show HN points to something real, its possible today to ship projects as a one man shop that looks like something that just a year or so would have required a team.

Personally I have noticed strange effects, where I previously would have reached for a software package to make something or solve an issue, its now often faster for me to write a specific program just for my use case. Just this weekend I needed a reel with a specific look to post on instagram but instead of trying to use something like after effects, i could quickly cobble together a program that was using css transforms that outputted a series of images I could tie together with ffmpeg.

About a month ago I was unhappy with the commercial ticketing systems, they were both expensive and opaque so I made my own. Obviously for a case like that you need discipline and testing when you take peoples money, so there was a lot of focus on end to end testing.

I have a few more examples like this, but to make this work you need to approach using LLMs with a certain amount of rigour. The hardest part is to prevent drift in the model. There are a certain number things you can do to make the model grounded in reality.

When the tool doesn’t have a reproducer, it’ll happily invent a story and you’ll debug the story. If you ground the root cause in for example a test, the model can get context enough to actually solve the problem.

Another issue is that you need to read and understand code quickly, but its no real difference from working with other developers. When tests are passing I usually do a PR to myself and then review as I usually would do.

A prerequisite is that you need tight specs, but those can also be generated if you are experienced enough. You need enough domain intuition to know what ‘done’ means and what to measure.

Personally I think the bottleneck will go from trying to get into a flow state to write solutions to analyze the problem space and verification.


> I think the deluge on projects on show HN points to something real, its possible today to ship projects as a one man shop that looks like something that just a year or so would have required a team.

Lots of these project have a lifespan of a week and will never ever be maintained. When you pour blood and sweat in a projet you get attached to it, when you vibe code it in an afternoon and it's not and instant hit you move on to the next one.


I lived in London for 15 years. Southbank and the Barbican were some of my favourite places there.

Barbican is particularly interesting since its part of the city of London, and whereas the city mostly contains bad neoclassical designs that feel dystopian and inhuman Barbican feels like a fresh breath of air.

It has a human centric design and it uses water and greenery to temper the concrete.

Its interesting that crowds in connection to or within the southbank center also always feel lively. I'm uncertain of why, perhaps the concrete makes a counterpoint to humanness and makes us focus on the people in the vicinity.

Perhaps its the cultural programming. But the end result for me was that whenever I was around these blocks of concrete I was almost always in a good mood.


This matches my experience, especially "don’t draw the owl" and the harness-engineering idea.

The failure mode I kept hitting wasn’t just "it makes mistakes", it was drift: it can stay locally plausible while slowly walking away from the real constraints of the repo. The output still sounds confident, so you don’t notice until you run into reality (tests, runtime behaviour, perf, ops, UX).

What ended up working for me was treating chat as where I shape the plan (tradeoffs, invariants, failure modes) and treating the agent as something that does narrow, reviewable diffs against that plan. The human job stays very boring: run it, verify it, and decide what’s actually acceptable. That separation is what made it click for me.

Once I got that loop stable, it stopped being a toy and started being a lever. I’ve shipped real features this way across a few projects (a git like tool for heavy media projects, a ticketing/payment flow with real users, a local-first genealogy tool, and a small CMS/publishing pipeline). The common thread is the same: small diffs, fast verification, and continuously tightening the harness so the agent can’t drift unnoticed.


>The failure mode I kept hitting wasn’t just "it makes mistakes", it was drift: it can stay locally plausible while slowly walking away from the real constraints of the repo. The output still sounds confident, so you don’t notice until you run into reality (tests, runtime behaviour, perf, ops, UX).

Yeah I would get patterns where, initial prototypes were promising, then we developed something that was 90% close to design goals, and then as we try to push in the last 10%, drift would start breaking down, or even just forgetting, the 90%.

So I would start getting to 90% and basically starting a new project with that as the baseline to add to.


No harm meant, but your writing is very reminiscent of an LLM. It is great actually, there is just something about it - "it wasn't.. it was", "it stopped being.. and started". Claude and ChatGPT seem to love these juxtapositions. The triplets on every other sentence. I think you are a couple em-dashes away from being accused of being a bot.

These patterns seem to be picking up speed in the general population; makes the human race seem quite easily hackable.


>makes the human race seem quite easily hackable.

If the human race were not hackable then society would not exist, we'd be the unchanging crocodiles of the last few hundred million years.

Have you ever found yourself speaking a meme? Had a catchy toon repeating in your head? Started spouting nation state level propaganda? Found yourself in crowd trying to burn a witch at the stake?

Hacking the flow of human thought isn't that hard, especially across populations. Hacking any one particular humans thoughts is harder unless you have a lot of information on them.


How do I hack the human population to give me money, and simultaneously, hack law enforcement to not arrest me?


> How do I hack the human population to give me money

Make something popular or become famous.

> hack law enforcement to not arrest me

Don't become famous with illegal stuff.

The hack is that we live in a society that makes people think they need a lot of money and at the same time allows individuals to accumulate obscene amounts of wealth and influence and many people being ok with that.


> Don't become famous with illegal stuff.

Is this still a constraint? It would seem that money beyond a certain point allows one to wash ones reputation.


For sure. But becoming famous for robbing the bank is not the way.


It worked for most of our current political elites.


This is the most common answer from people that are rocking and rolling with AI tools but I cannot help but wonder how is this different from how we should have built software all along. I know I have been (after 10+ years…)


I think you are right, the secret is that there is no secret. The projects I have been involved with thats been most successful was using these techniques. I also think experience helps because you develop a sense that very quickly knows if the model wants to go in a wonky direction and how a good spec looks like.

With where the models are right now you still need a human in the loop to make sure you end up with code you (and your organisation) actually understands. The bottle neck has gone from writing code to reading code.


> The bottle neck has gone from writing code to reading code.

This has always been the bottleneck. Reviewing code is much harder and gets worse results than writing it, which is why reviewing AI code is not very efficient. The time required to understand code far outstrips the time to type it.

Most devs don’t do thorough reviews. Check the variable names seem ok, make sure there’s no obvious typos, ask for a comment and call it good. For a trusted teammate this is actually ok and why they’re so valuable! For an AI, it’s a slot machine and trusting it is equivalent to letting your coworkers/users do your job so you can personally move faster.


This is what I experienced as well.

these are some ticks I use now.

1. Write a generic prompts about the project and software versions and keep it in the folder. (I think this getting pushed as SKIILS.md now)

2. In the prompt add instructions to add comments on changes, since our main job is to validate and fix any issues, it makes it easier.

3. Find the best model for the specific workflow. For example, these days I find that Gemini Pro is good for HTML UI stuff, while Claude Sonnet is good for python code. (This is why subagents are getting popluar)


Would love to hear more about your geneology app.


Haha, almost, we add an extra word and some signifiers.

"Vi ses i röken och dimman! "

It actually means something specific. We tend to use a smoke machine a lot on our nights, one time the police showed up because they thought the place was on fire. The symbols at the end signifies the electricity of nights and the headphones is of course a reference to our social media headphone walks.

This is fixed catchphrase we use in all our communications.


Totally fair. It’s a real monthly music night, not software.

The Show HN part is the site + media (so people can see the scale/atmosphere), and the thing we’re trying to share is the operating model: how you get strangers to show up alone, feel safe, and come back, without big budgets.


As the other poster commented Kolibri is hummingbird in Swedish. The name was inspired by 2 things. A feeling of lightness and ease connected with carefree summer nights and also the intros of all versions of pacific by 808 state.


Kul att höra från dig! Det är många olika personer runt Kolibri som känner dig från Berlin som t.ex. Henrik Maneuever så det var extra kul när du besökte oss.


Kommer gärna och spelar nån gång;)


Thanks you so much, Maria & Jonatan


We do agree with this, we both prefer attending small gigs ourselves for that exact reason. Also, all bands has to start somewhere, it takes many small gigs to create an audience and develop their craft. Writing and producing songs is one thing but there is no substitution to the experience to see what moves in people listening live. The majority big stadium bands started with endless small non paying gigs, this is the foundation of the music business.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: