Using models to go from spec to program is one use case, but it’s not the whole story. I’m not hand-writing specs; I use LLMs to iteratively develop the spec, the validation harness, and then the implementation. I’m hands-on with the agents, and hands-off with our workflow style we call Attractor
In practice, we try to close the loop with agents: plan -> generate -> run tests/validators -> fix -> repeat. What I mainly contribute is taste and deciding what to do next: what to build, what "done" means, and how to decompose the work so models can execute. With a strong definition of done and a good harness, the system can often converge with minimal human input. For debugging, we also have a system that ingests app logs plus agent traces (via CXDB).
The more reps you get, the better your intuition for where models work and where you need tighter specs. You also have to keep updating your priors with each new model release or harness change.
This might not have been a clear answer, but I am happy to keep clarifying as needed!
> What do you show to new folks when they join your team?
I think this is an interesting question because we have not fully figured out the best way to onboard people to our codebases. Each person is responsible for multiple codebases (yay microservices!), and no one else commits to a repository while they have dibs. We also have conventions for how agents write documentation around deployments and validations.
In theory, when a new person joins the team or is handed a repository, they can throw some tokens at the codebase, interrogate it, and ask questions about how things are implemented.
> But what is the result of your work?
The end result is a final, working codebase. The specs and sprint plans are also committed to the repository for posterity, so agents in a fresh session can see what work has been completed and the trajectory we are moving toward.
>But it does reduced by an order of magnitude the amount of money you need to spend on programming a solution that would work better
Could you share any data on this? Are there any case studies you could reference or at least personal experience? One order of magnitude is 10x improvement in cost, right?
I‘m not sure it’s a perfect example, but at least it’s a very realistic example from a company that really doesn’t have time and energy for hype or fluff:
We are currently sunsetting our use of Webflow for content management and hosting, and are replacing it with our own solution which Cursor & Claude Opus helped us build in around 10 days:
And the big advantage for us is two things: Our content marketers now have a "Cursor-light" experience when creating landingpages, as this is a "text-to-landingpage" LLM-powered tool with a chat interface from their point of view; no fumbling around in the Webflow WYSIWYG interface anymore.
And from the software engineering department's point of view, the results of the work done by the content marketers are simply changes/PR in a git repository, which we can work on in the IDE of our choice — again, no fumbling around in the Webflow WYSIWYG interface anymore.
looks like the same framework they used to build chatgpt desktop (electron)
edit - from another comment:
> Hi! Romain here, I work on Codex at OpenAI. We totally hear you. The team actually built the app in Electron specifically so we can support Windows and Linux as well. We shipped macOS first, but Windows is coming very soon. Appreciate you calling this out. Stay tuned!
>I think that really high quality code can be created via coding agents. Not in one prompt, but instead an orchestration of planning, implementing, validating, and reviewing.
Do you have any advice to share (or resources)? Have you experienced it yourself?
>This project leverages the Gemini APIs to provide AI capabilities. For details on the terms of service governing the Gemini API, please refer to the terms for the access mechanism you are using:
Click Gemini API, scroll
>When you use Unpaid Services, including, for example, Google AI Studio and the unpaid quota on Gemini API, Google uses the content you submit to the Services and any generated responses to provide, improve, and develop Google products and services and machine learning technologies, including Google's enterprise features, products, and services, consistent with our Privacy Policy.
>To help with quality and improve our products, human reviewers may read, annotate, and process your API input and output. Google takes steps to protect your privacy as part of this process. This includes disconnecting this data from your Google Account, API key, and Cloud project before reviewers see or annotate it. Do not submit sensitive, confidential, or personal information to the Unpaid Services.
From that document I read that it in fact does, but it doesn't release memory if app started consuming less. It does memory balooning though, so the VM only consumes as much RAM as the maximum amount requested by the app
It's a bit more complicated. For the purposes of the GDPR legal obligations within the EU (where we might assume relevant protections are in place) might be considered differently than eg legal obligations towards the Chinese communist party, or the NSA.
reply