Hacker Newsnew | past | comments | ask | show | jobs | submit | hsaliak's commentslogin

This is the way. This exact workflow is my sweet spot.

In my coding agent std::slop I've optimized for this workflow https://github.com/hsaliak/std_slop/blob/main/docs/mail_mode... basically the idea is that you are the 'maintainer' and you get bisect safe, git patches that you review (or ask a code reviewer skill or another agent to review). Any change re-rolls the whole stack. Git already supports such a flow and I added it to the agent. A simple markdown skill does not work because it 'forgets'. A 'github' based PR flow felt too externally dependent. This workflow is enforced by a 'patcher' skill, and once that's active, tools do not work unless they follow the enforced flow.

I think a lot of people are going to feel comfortable using agents this way rather than going full blast. I do all my development this way.


This is broadly how I worked when I was still using chat instead of cli agents for LLM support. The downside, I feel, is that unless this is a codebase / language / architecture I do not know, it feels faster to just code by hand with the AI as a reviewer rather than a writer.

your patch queue approach is very clever. Solves a huge tech debt poblem with llm code gen. Should work with jujitsu too probably.

Would be curious to see more about how you save tokens with lua too.

Do you blog?


Thanks for your interest in this work - I do not blog(maybe I should?) but i have posted a bit more on X about this work.

- A bit more on mail mode https://x.com/hsaliak/status/2020022329154420830

- on the Lua integration https://x.com/hsaliak/status/2022911468262350976 (I've since disabled the recursion, not every code file is long and it seems simpler to not do it), but the rest of it is still there

- hotwords for skill activation https://x.com/hsaliak/status/2024322170353037788

Also /review and /feedback. /feedback (the non code version) opens up the LLM's last response in an editor so you can give line by line comments. Inspired by "not top posting" from mailing lists.


I quit x so cant read beyond toplevel links. I subscribed to your tool on github, would appreciate blog-posts-in-release notes to keep up with future developments. Will try the tool. Rare to find something new among ai hype, thank you.

Fair enough. I'll find a way to publish some of this. I try to cover most of the information in the docs/ folder, and keep it up to date. Blog posts in release notes is a good idea!

Google's Pro service (no idea about ultra and I have no intention to find out) is riddled with 429s. They have generous quotas for sure, but they really give you very low priority. For example, I still dont have access to Gemini 3.1 from that endpoint. It's completely uncharacteristic of Google.

I analyzed 6k HTTP requests on the Pro account, 23% of those were hit with 429s. (Though not from Gemini-CLI, but from my own agent using code assist). The gemini-cli has a default retry backoff of 5s. That's verifiable in code, and it's a lot.

I dont touch the anti-gravity endpoint, unlike code-assist, it's clear that they are subsidizing that for user acquisition on that tool. So perhaps it's ok for them to ban users form it.

I like their models, but they also degrade. It's quite easy to see when the models are 'smart' and capacity is available, and when they are 'stupid'. They likely clamp thinking when they are capacity strapped.

Yes the models are smart, but you really cant "build things" despite the marketing if you actively beat back your users for trying. I spent a decade at Google, and it's sad to see how they are executing here, despite having solid models in gemini-3-flash and gemini-3.1


> Yes the models are smart, but you really cant "build things" despite the marketing if you actively beat back your users for trying

I think this is the most important takeaway from this thread and at some point, this will end up biting Google and Anthropic back.

OpenAI seems to have realized this and is actively trying to do the opposite. They welcomed OpenCode the same day Anthropic banned them, X is full of tweets of people saying codex $20 plan is more generous than Anthropic's $200 etc.

If you told me this story a year ago without naming companies, I would tell you it's OpenAI banning people and Google burning cash to win the race.

And it's not like their models are winning any awards in the community either.


My impression is there's a definite shortage of GPUs, and if OpenAI is more reliable it's because they have fewer customers relative to the number of GPUs they have. I don't think Google is handing out 429s because they are worried about overspending; I think it's because they literally cannot serve the requests.

This sounds very plausible. OpenAI has hoarded 40% of world's RAM supply, which they likely have no use for other than to starve competition. They (or other competitors) could be utilizing the same strategy for other hardware.

Which is worrying, because if this continues, and if Google, who has GCP is struggling to serve requests, there's no telling what's going to happen with services like Hetzner etc.


> OpenAI has hoarded 40% of world's RAM supply

I believe OpenAI's purchasing is somewhat overstated, it definitely has no effect on Google's current ability to serve Gemini requests, but it is obvious that there's a shortage of most components, and it's also obvious that even internally Google is having to make hard choices about who to let use GPUs when.

I definitely think OpenAI likely has less use for GPUs than Google. Google has $300B in annual revenue vs. $20B for OpenAI. Even if you assume 100% of OpenAI's revenue is going to renting GPUs and they are taking a 50% loss there's still a lot of room for Google to be profitable and spending more money on GPUs, and not have enough GPUs. Google also just has a wider variety of models to train and run, from Waymo to Search to whatever advertising models.



OpenAI is dependent on same hyperscalers (most specifically Microsoft/Azure) as everyone else, and even have access to preferential pricing due to their partnership.

A better explanation is to point out that ChatGPT is still far and away the most popular AI product on the planet right now. When ChatGPT has so many more users, multi-tenant economic effects are stronger, taking advantage of a larger number of GPUs. Think of S3: requests for a million files may load them from literally a million drives due to the sheer size of S3, and even a huge spike from a single customer fades into relatively stable overall load due to the sheer number of tenants. OpenAI likely has similar hardware efficiencies at play and thus can afford to be more generous with spikes from individual customers using OpenCode etc.


I would guess the biggest AI product on the planet is Google's Search AI. Although even that might not be the case, unless your definition of AI is just "LLMs" and not any sort of ML that requires a GPU.

You can build plenty with Google ai pro plan and Antigravity. Yea there's some limits that should be even higher, but you can still build stuff.

It's unfortunate though that they lie and deceive by having a name called "Open"AI when they are in fact "Closed". And the whole non-profit to profit and Microsoft deals are just untrustable and unethical.

They also actively employ dark strategies in cooperation with CIA and who knows when they will pull the rug under you again.

Do you really trust a foundational rotten group of people who avoid accountability?


I don't know what it's called when something becomes an irony and then this irony becomes an irony itself, but that's what's up with OpenAI today. On one hand, they started this 'we're closing things down because safety' line, they normalized $200/mo subscriptions, but now they're becoming the most open AI company between the big 3. Their tooling is open source, they're lenient on their quotas on lower plans, and their allowance of third party integrations is also unique.

I would still consider OpenAI naming incorrect, but between the 3, they kind of are, open.


Ok I heard that. If They're ironically the most open of American companies, and Chinese companies are more open than American companies, isn't that the most ironic of all!

I'm guessing at least 50% of the "users" of Antigravity are actually OpenCode users exploiting the oauth and endpoint. Must be infuriating to them if they're subsidizing it.

The OpenCode plugin (8.7k stars btw!) even advertises "Multi-account support — add multiple Google accounts, auto-rotates when rate-limited"[1]

[1] https://github.com/NoeFabris/opencode-antigravity-auth/blob/...


I've stopped using Gemini models altogether because of this. I'm using Claude Code with MiniMax M2.5 for a while now and i couldn't be happier. I haven't noticed any drop in output quality and the biggest advantage is that even the $10 is pretty generous. I haven't been hit with rate limit, not even one time. And i'm pretty heavy user. I tried also GLM 5.0 but i hit rate limit there pretty early on.

One thing with GLM 5 is they seem to do this weird thing where when your account is just opened it limits you really heavy, then this gets lifted later.

I had buyers remorse when the first hour or two I kept getting rate limited on GLM5, but since then i've not had a single rate limit and I am using it very heavily.


Just adding for context that I use Gemini Ultra and across all models from Gemini 3.1 Pro to Claude Opus 4.6, I have never hit 429s as well as hitting model quota limits is incredibly rare and only happens if I am trying to run 3 projects at once. While not the biggest agentic coding fan, I have been toying with them and have been running it for at least 7-8 hours a day if not longer.

I’ve often suspected these models of getting dumber when the service is under high load. But I’ve never seen actually measured results or proof. Anybody know of real published data here?

Here's a recent comment [1] by an OpenAI engineer confirming that they do in fact make such trade offs between intelligence and efficiency.

[1]: https://news.ycombinator.com/item?id=46909905


That comment only says that they have a lot of different options for smaller & faster models that people can opt into. It doesn't say that they dynamically scale things up or down depending on demand.

ChatGPT was brutal for it a couple years ago. You could tell when it would go into “lazy mode” during peak usage periods.

Suddenly instead of writing the code you asked for it would give some generic bullet points telling you to find a library to do what you asked for and read the documentation.


> ChatGPT was brutal for it a couple years ago. You could tell when it would go into “lazy mode” during peak usage periods.

ChatGPT web has been doing this for a week now, for me. Ask some technical question and get a reply absolutely filled with AI phrases (Not $X, Just $Y, the key insight, the deeper insight, etc) dominating about 50% of the text, with the remaining 50% some generic filler stuff partially related to the tech I asked.

Last night I read through a ChatGPT web response about solutions for a security bootstrapping problem without holding keys/password, and it spat out pages and pages of key insights, all nicely numbered sections with bullet points in each section, without actually answering the question.

Moved to Claude Web immediately, got a usable answer on the first try.


Not exactly what you're looking for but https://news.ycombinator.com/item?id=46810282

This is exactly what I want for data, but doesn’t seem to draw any conclusions yet. Thanks!

It is indeed somewhat sad/ridiculous to see that my GH Copilot Pro grants me access to Gemini 3.1, but my Google AI Pro does not

The eventual nerfing gives me pause. Flash is awesome. What we really want is gemini-3.1-flash :)

https://github.com/hsaliak/filc-bazel-template i created this recently to make it super easy to get started with fil-c projects. If you find it daunting to get started with the setup in the core distribution and want a 3-4 step approach to building a fil-c enabled binary, then try this.

It's interesting that we do not have a wave of tier-2 companies with NNN Million dollar cap releasing anything competitive. It's the big 4 labs vs the chinese labs. No Tier-2.


There’s mistral


I've not had good luck with devstral at all..I am really rooting for them though!


It's been a long time since they were good. But Europe definitely needs a homegrown frontier model company, one way or the other. I consider them Tier 2 right now.


this aligns perfecly with my experience, but of course, the discourse on X and other forums are filled with people who are not hands on. Marketing is first out of the gate. These models are not yet good enough to be put through a long coding session. They are getting better though! GLM 4.7 and Kimi 2.5 are alright.


gemini-3-flash-preview will be GA soon i hope. /s


i thought this was about allegro common lisp and lynx browser. It's about something else completely. That's not cool.


There is a balance to be struck. Not everyone is going to be comfortable with ralph loops. Some are going to be OK with running a single agent, some with advanced code completion or code generation for specific functionality and so on.

The tooling is going to change how we do development no doubt, but people are going to find their comfortable spot, and be productive.


A language targeting an LLM might be well served with a lot of keywords, similar to a CISC instruction set, where keywords do specific things well. Giving it building blocks and having them piece together is likely to pay off.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: