Hacker Newsnew | past | comments | ask | show | jobs | submit | solomatov's commentslogin

You aren't supposed to read code, but do you from time to time, just to evaluate what is going on?

No. But, I do ask questions (in $CODING_AGENT to always have a good mental model of everything that I’m working on though.

Is it essentially using LLMs as a compiler for your specs?

What do you do if the model isn't able to fulfill the spec? How do you troubleshoot what is going on?


Using models to go from spec to program is one use case, but it’s not the whole story. I’m not hand-writing specs; I use LLMs to iteratively develop the spec, the validation harness, and then the implementation. I’m hands-on with the agents, and hands-off with our workflow style we call Attractor

In practice, we try to close the loop with agents: plan -> generate -> run tests/validators -> fix -> repeat. What I mainly contribute is taste and deciding what to do next: what to build, what "done" means, and how to decompose the work so models can execute. With a strong definition of done and a good harness, the system can often converge with minimal human input. For debugging, we also have a system that ingests app logs plus agent traces (via CXDB).

The more reps you get, the better your intuition for where models work and where you need tighter specs. You also have to keep updating your priors with each new model release or harness change.

This might not have been a clear answer, but I am happy to keep clarifying as needed!


But what is the result of your work? What do you commit to the repo? What do you show to new folks when they join your team?

> What do you show to new folks when they join your team?

I think this is an interesting question because we have not fully figured out the best way to onboard people to our codebases. Each person is responsible for multiple codebases (yay microservices!), and no one else commits to a repository while they have dibs. We also have conventions for how agents write documentation around deployments and validations.

In theory, when a new person joins the team or is handed a repository, they can throw some tokens at the codebase, interrogate it, and ask questions about how things are implemented.

> But what is the result of your work?

The end result is a final, working codebase. The specs and sprint plans are also committed to the repository for posterity, so agents in a fresh session can see what work has been completed and the trajectory we are moving toward.


>But it does reduced by an order of magnitude the amount of money you need to spend on programming a solution that would work better

Could you share any data on this? Are there any case studies you could reference or at least personal experience? One order of magnitude is 10x improvement in cost, right?


I‘m not sure it’s a perfect example, but at least it’s a very realistic example from a company that really doesn’t have time and energy for hype or fluff:

We are currently sunsetting our use of Webflow for content management and hosting, and are replacing it with our own solution which Cursor & Claude Opus helped us build in around 10 days:

https://dx-tooling.org/sitebuilder/

https://github.com/dx-tooling/sitebuilder-webapp


Thanks for the link.

So, basically you made a replacement for webflow for your use case in 10 days, right?


That's fair to say, yes, with the important caveat that it isn't a 1:1 replacement of Webflow, which is exactly the point.

I’m not sure the world needed yet another CMS

It doesn't. The person is saying they built just the functionality they needed. Probably 25% of a CMS. That's the point.

Exactly.

And the big advantage for us is two things: Our content marketers now have a "Cursor-light" experience when creating landingpages, as this is a "text-to-landingpage" LLM-powered tool with a chat interface from their point of view; no fumbling around in the Webflow WYSIWYG interface anymore.

And from the software engineering department's point of view, the results of the work done by the content marketers are simply changes/PR in a git repository, which we can work on in the IDE of our choice — again, no fumbling around in the Webflow WYSIWYG interface anymore.


This is the benefit few understand properly. The storage layer is where you get a lot of benefits.

Is it open source? Do they disclose which framework they use for the GUI? Is it Electron or Tauri?

lol ofc not

looks like the same framework they used to build chatgpt desktop (electron)

edit - from another comment:

> Hi! Romain here, I work on Codex at OpenAI. We totally hear you. The team actually built the app in Electron specifically so we can support Windows and Linux as well. We shipped macOS first, but Windows is coming very soon. Appreciate you calling this out. Stay tuned!


>I think that really high quality code can be created via coding agents. Not in one prompt, but instead an orchestration of planning, implementing, validating, and reviewing.

Do you have any advice to share (or resources)? Have you experienced it yourself?



This all sounds interesting, but how effective are they? Did anyone has experience with any of them?


Yes, agentic search over vector embeddings. It can be very effective.


It's a very well known pattern. But what about others? There're a lot of very interesting stuff there.


Tool Use Steering via Prompting. I’ve seen that work well also, but I don’t know if I’d quite call it an architectural pattern.


I’m eager to tackle issues and PRs.


I couldn't find any mentions of whether they train their models on your source code. May be someone was able to?


Yes they do. Scroll to bottom of Github readme

>This project leverages the Gemini APIs to provide AI capabilities. For details on the terms of service governing the Gemini API, please refer to the terms for the access mechanism you are using:

Click Gemini API, scroll

>When you use Unpaid Services, including, for example, Google AI Studio and the unpaid quota on Gemini API, Google uses the content you submit to the Services and any generated responses to provide, improve, and develop Google products and services and machine learning technologies, including Google's enterprise features, products, and services, consistent with our Privacy Policy.

>To help with quality and improve our products, human reviewers may read, annotate, and process your API input and output. Google takes steps to protect your privacy as part of this process. This includes disconnecting this data from your Google Account, API key, and Cloud project before reviewers see or annotate it. Do not submit sensitive, confidential, or personal information to the Unpaid Services.


There must be thousands of keys in those logs.


If you use for free: Yes

If you pay for API: No


Is anyone aware of a good llm orchestration libraries for go like langchain for Python and Typescript?


https://github.com/diveagents/dive

Dive orchestrates multi agent workflows in Go. Take a look and let me know what you think.


I prefer MacBook to iPad most of the time. The only use case for iPad for me where it shines is when I need to use a pencil.


Does anyone know whether they have optimized memory management, i.e. virt machine not consuming more RAM than required?



From that document I read that it in fact does, but it doesn't release memory if app started consuming less. It does memory balooning though, so the VM only consumes as much RAM as the maximum amount requested by the app


Not a lawyer, but my understanding it's not since legal obligations is a reason for processing personal data.


That excuse in EU holds only against an EU court or ICJ or ICC. EU doesn't recognise legal holds of foreign jurisdictions.


Do you have any references to share?


It's a bit more complicated. For the purposes of the GDPR legal obligations within the EU (where we might assume relevant protections are in place) might be considered differently than eg legal obligations towards the Chinese communist party, or the NSA.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: