Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I tried doing some vibe coding on a greenfield project (using gemini 2.5 pro + cline). On one hand - super impressive, a major productivity booster (even compared to using a non-integrated LLM chat interface).

I noticed that LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt. One easy example is that I noticed them breaking abstractions (putting things where they don't belong). Unfortunately, there's not that much self-retrospection on these aspects if you ask about the quality of the code or if there are any better ways of doing it. Of course, if you pick up that something is in the wrong spot and prompt better, they'll pick up on it immediately.

I also ended up blowing through $15 of LLM tokens in a single evening. (Previously, as a heavy LLM user including coding tasks, I was averaging maybe $20 a month.)



> I also ended up blowing through $15 of LLM tokens in a single evening.

This is a feature, not a bug. LLMs are going to be the next "OMG my AWS bill" phenomenon.


Cline very visibly displays the ongoing cost of the task. Light edits are about 10 cents, and heavy stuff can run a couple of bucks. It's just that the tab accumulates faster than I expect.


> Light edits are about 10 cents

Some well-paid developers will excuse this with, "Well if it saved me 5 minutes, it's worth an order of magnitude than 10 cents".

Which is true, however there's a big caveat: Time saved isn't time gained.

You can "Save" 1,000 hours every night, but you don't actuall get those 1,000 hours back.


> You can "Save" 1,000 hours every night, but you don't actuall get those 1,000 hours back.

What do you mean?

If I have some task that requires 1000 hours, and I'm able to shave it down to one hour, then I did just "save" 999 hours -- just in the same way that if something costs $5 and I pay $4, I saved $


My point is that saving 1,000 hours each day doesn't actually give you 1,000 hours a day to do things with.

You still get your 24 hours, no matter how much time you save.

What actually matters is the value of what is delivered, not how much time it actually saves you. Justifying costs by "time saved" is a good way to eat up your money on time-saving devices.


If I "save 1000 hours" then that could be distributed over 41.666 days, so no task would need to be performed during that period because "I saved 1000 hours".

You could also say you saved 41.666 people an entire 24 hour day, by "saving 1000 hours", or some other fractional way.

How you're trying to explain it as "saving 1000 hours each day" is really not making any sense without further context.

And I'm sure if I hadn't written this comment I would be saving 1000 hours on a stupid comment thread.


You're overthinking it.

It's like this coupon booklets they used to sell. "Over $10,000 of savings!"

Yes but how much money do I have to spend in order to save $10,000?

There was this funny commercial in the 90s for some muffler repair chain that was having a promotion: "Save Fifty Dollars..."

The theme was "What will you do with the fifty dollars you saved?" And it was people going to Disneyland or afancy dinner date.

The people (actors) believed they were receiving $50. They acted as if it was free money. Meanwhile there was zero talk about whether their cars needed muffler repair at all.


> Meanwhile there was zero talk about whether their cars needed muffler repair at all.

It's called "Thinking past the sale". It's a common sales tactic.


I think one issue is that you won't always be able to invoice those extra 999 hours to your customer. Sometimes you'll still only be able to get paid for 1 hour, depending on the task and contract.

But the llm bill will always invoice you for all the saved work regardless.


    Hourly_rate / 12 = 5min_rate

    If light_edit_cost < 5min_rate then savings=true


(from a companies perspective, this is true). As a developer, you may not be paid by the task -- If I finish something early, I start work on the next thing.


So many people seem to be missing your point that I’m honestly wondering if you’re being trolled here.


Huh? What happens if you stop using your washing machine and go back to hand washing everything?


If you earn more than me, then if you value "time saved" then you should pay me to take my washing off me. Because then you can save even more of your valuable time!

The more of my washing you can take off me, the more of your time you can save by then using a washing machine or laundry service!

Saving an hour of my time is a waste, when saving an hour of your time is worth so much more. So it makes economic sense for you to pay me, to take my washing off me!

( Does that better illustrate my point? )


> Cline very visibly displays the ongoing cost of the task

LLMs are now being positioned as "let them work autonomously in the background" which means no one will be watching the cost in real time.

Perhaps I can set limits on how much money each task is worth, but very few would estimate that properly.


> LLMs are now being positioned as "let them work autonomously in the background"

The only people who believe this level of AI marketing are the people who haven't yet used the tools.

> which means no one will be watching the cost in real time.

Maybe some day there's an agentic coding tool that goes off into the weeds and runs for days doing meaningless tasks until someone catches it and does a Ctrl-C, but the tools I've used are more likely to stop short of the goal than to continue crunching indefinitely.

Regardless, it seems like a common experience for first-timers to try a light task and then realize they've spent $3, instantly setting expectations for how easy it is to run up a large bill if you're not careful.


Especially at companies (hence this github one), where the employees don't care about cost because it's the boss' credit card.


I think that models are gonna commoditize, if they haven't already. The cost of switching over is rather small, especially when you have good evals on what you want done.

Also there's no way you can build a business without providing value in this space. Buyers are not that dumb.


They are already quite commoditized. Commoditization doesn't mean "cheap", and it doesn't mean you won't spend $15 a night like the GP did.


> I also ended up blowing through $15 of LLM tokens in a single evening.

Consider using Aider, and aggressively managing the context (via /add, /drop and /clear).

https://aider.chat/


I, too, recommend aider whenever these discussions crop up; it converted me from the "AI tools suck" side of this discussion to the "you're using the wrong tool" side.

I'd also recommend creating little `README`'s in your codebase that are mainly written with aider as the intended audience. In it, I'll explain architecture, what code makes (non-)sense to write in this directory, and so on. Has the side-effect of being helpful for humans, too.

Nowadays when I'm editing with aider, I'll include the project README (which contains a project overview + pointers to other README's), and whatever README is most relevant to the scope of my session. It's super productive.

I'm yet to find a model that beats the cost-effectiveness of Sonnet 3.7. I've tried the latest deepseek models, and while I love the price (nearly 50x cheaper?), it's just far too error-prone compared to Sonnet 3.7. It generates solid plans / architecture discussions, but, unlike Sonnet, the code it generates often confidently off-the-mark.


Have you tried Gemini 2.5? It's cheaper and scores higher on the Aider leaderboard.


I haven’t yet, I’ll give it a shot!


It's so good!


Why create READMEs and not just comments in the code?


I’d generally prefer comments in code. The README’s are relatively sparse and contain information that would be a bit too high-level for module or class-level comments. If commentary is specific to a module or class or method, the documentation belongs there. My rule of thumb is if the commentary helps you navigate and understand rules that apply to entire sets of modules rooted at `foo/`, it generally belongs in `foo/README`.

For example “this module contains logic defining routes for serving an HTTP API. We don’t write any logic that interacts directly with db models in these modules. Rather, these modules make calls to services in `/services`, which make such calls.”

It wouldn’t make sense to duplicate this comment across every router sub-module. And it’s not obvious from looking at any one module that this rule is applied across all modules, without such guidance.

These little bits of scaffolding really help narrow down the scope of the code that LLMs eventually try to write.


Comments are not easy for the LLM to refer to, ironically: https://taoofmac.com/space/blog/2025/05/13/2230


There is a better way than just READMEs: https://taoofmac.com/space/blog/2025/05/13/2230


I like this a lot! My root README ends up looking a lot like your SPEC.md, and I also have a file that’s pretty similar to your TODO.md.

My experience agrees that separating the README and the TODO is super helpful for managing context.


My tool Plandex[1] allows you to switch between automatic and manual context management. It can be useful to begin a task with automatic context while scoping it out and making the high level plan, then switch to the more 'aider-style' manual context management once the relevant files are clearly established.

1 - https://github.com/plandex-ai/plandex

Also, a bit more on auto vs. manual context management in the docs: https://docs.plandex.ai/core-concepts/context-management


I loathe using AI in a greenfield project. There are simply too many possible paths, so it seems to randomly switch between approaches.

In a brownfield code base, I can often provide it reference files to pattern match against. So much easier to get great results when it can anchor itself in the rest of your code base.


The trick for greenfield projects is to use it to help you design detailed specs and a tentative implementation plan. Just bounce some ideas off of it, as with a somewhat smarter rubber duck, and hone the design until you arrive at something you're happy with. Then feed the detailed implementation plan step by step to another model or session.

This is a popular workflow I first read about here[1].

This has been the most useful use case for LLMs for me. Actually getting them to implement the spec correctly is the hard part, and you'll have to take the reigns and course correct often.

[1]: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/


Here’s my workflow, it takes that a few steps further: https://taoofmac.com/space/blog/2025/05/13/2230


This seems like a good flow! I end up adding a "spec" and "todo" file for each feature[1]. This allows me to flesh out some of the architectural/technical decisions in advance and keep the LLM on the rails when the context gets very long.

[1] https://notes.jessmart.in/My+Writings/Pair+Programming+with+...


Yeah, I limit context by regularly trimming the TODOs. I like having 5-6 in one file because it sometimes informs the LLM as to how to complete the first in a way that makes sense for the follow-ups.

READMEs per module also help, but it really depends a lot on the model. Gemini will happily traipse all over your codebase at random, gpt-4.1 will do inline imports inside functions because it seems to lack any sort of situational awareness, Claude so far gets things mostly right.


The trouble occurs when the brownfield project is crap already.


While its being touted for Greenfield projects I've notices a lot of failures when it comes to bootstrapping a stack.

For example it (Gemini 2.5) really struggles with newer ecosystem like Fastapi when wiring libraries like SQLAlchemy, Pytest, Python-playwright, etc., together.

I find more value in bootstrapping myself, and then using it to help with boiler plate once an effective safety harness is in place.


I've vibe coded small project as well using Claude Code. It's about visitors registration at the company. Simple project, one form, a couple of checkboxes, everything is stored in sqlite + has endpoint for getting .xlsx.

Initial cost was around $20 USD, which later grew to (mostly polishing) $40 with some manual work.

I've intentionally picked up simple stack: html+js+php.

A couple of things:

* I'd say I'm happy about the result from product's perspective * Codebase could be better, but I could not care less about in this case * By default, AI does not care about security unless I specifically tell it * Claude insisted on using old libs. When I've specifically told it to use the latest and greatest, it upgraded them but left code that works just with an old version. Also it mixed latest DaisyUI with some old version of tailwindcss :)

On one hand it was super easy and fun to do, on the other hand if I was a junior engineer, I bet it would have cost more.


If you want to use Cline and are at all price sensitive (in these ranges) you have to do manual context management just for that reason. I find that too cumbersome and use Windsurf (currently with Gemini 2.5 pro) for that reason.


$15 in an evening sounds like a great deal when you consider the cost of highly-paid software engineers


> highly-paid software engineers

For now.


The money won't be flowing forever. This will cost you $6,000 a year.


A new grad at a FANG costs ~$200k-$250k a year after benefits


If the market is new Grad at 200-250k then this product won't sell many copies.

If this product is going to be successful they are going to need the bulk of their customers at 40-100k employees.


So the fully loaded cost of copilot is ~206k~256k a year?


> LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt

I wonder if the next phase would be the rise of (AI-driven?) "linters" that check that the implementation matches the architecture definition.


And now we've come full circle back to UML-based code generation.

Everything old is new again!


I think it's just that it's not end-to-end trained on architecture because the horizon is too short. It doesn't have the context length to learn the lessons that we do about good design.


> I noticed that LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt. One easy example is that I noticed them breaking abstractions

That doesn’t matter anymore when you’re vibe coding it. No human is going to look at it anyway.

It can all be if/else on one line in one file. If it works and if the LLMs can work at, iterate and implement new business requirements, while keeping performance and security - code structure, quality and readability don’t matter one bit.

Customers don’t care about code quality and the only reason businesses used to care is to make it less money consuming to build and ship new things, so they can make more money.


Wild take. Let’s just hand over the keys to LLMs I suppose, the fancy next token predictor is the capitan now.


Not that wild TBH.

This is a common view, and I think will be the norm on the near-to-mid term, especially for basic CRUD apps and websites. Context windows are still too small for anything even slightly complex (I think we need to be at about 20m before we start match human levels), but we'll be there before you know it.

Engineers will essentially become people who just guide the AIs and verify tests.


Have you ever tried to get those little bits of styrofoam completely off of a cardboard box? Have you ever seen something off in the distance and misjudged either what it was or how long it would take to get there?


LLMs need a very heavy hand in guiding the architecture because otherwise they'll code it in a way that even they can't maintain or expand.


Hook up something like Taskmaster or Shrimp, so that they can document as they go along and they can retrieve relevant context when they overflow their context to avoid this issue.

Then as the context window increases, it’s less and less of an issue


I don’t get it? Isn’t it just a monthly fixed subscription.


For now. Who is to say in 5 years where everyone makes this THE default workflow things work go up in price?


Nope - I use a-la-carte pricing (through openrouter). I much prefer it over a subscription, as there are zero limits, I pay only for what I use, and there is much less of a walled garden (I can easily switch between Anthropic, Google, etc).


I’m running o3 dozens of times a day all for the subscription price of $20. Surely this is way more cost effective.


Same here, same reasons!


Average coders, terrible engineers




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: