More

crazylogger · 2026-01-30T16:19:53 1769789993

You are describing tradition (deterministic?) automation before AI. With AI systems as general as today's SOTA LLMs, they'll happily take on the job regardless of the task falling into class I or class II.

Ask a robot arm "how should we improve our car design this year", it'll certainly get stuck. Ask an AI, it'll give you a real opinion that's at least on par with a human's opinion. If a company builds enough tooling to complete the "AI comes up with idea -> AI designs prototype -> AI robot physically builds the car -> AI robot test drives the car -> AI evaluates all prototypes and confirms next year's design" feedback loop, then theoretically this definitely can work.

This is why AI is seen as such a big deal - it's fundamentally different from all previous technologies. To an AI, there is no line that would distinguish class I from II.

crazylogger · 2026-01-13T05:25:07 1768281907

Or maybe, LLMs are pioneering scientific advancements - people are using LLMs to read papers, choose what problems to work on, come up with experiments, analyze results, and draft papers, etc., at this very moment. Except they eventually stick their human names on the cover so we almost never know.

crazylogger · 2026-01-09T03:11:00 1767928260

Proper vibe coding should involves tons of vibe refactoring.

I'd say spending at least a quarter of my vibe coding time on refactoring + documentation refresh to ensure the codebase looking impeccable is the only way my projects can work at all long term. We don't want to confuse the coding agent.

crazylogger · 2025-12-19T07:18:59 1766128739

From a couple hours of usage in the CLI, 5.2-codex seems to burn through my plan's limit noticeably faster than 5.1-codex. So I guess the usage limit is a set dollar amount of API credits under the hood.

crazylogger · 2025-11-15T12:05:15 1763208315

The way you get structured output with Claude prior to this is via tool use.

IMO this was the more elegant design if you think about it: tool calling is really just structured output and structured output is tool calling. The "do not provide multiple ways of doing the same thing" philosophy.

crazylogger · 2025-11-09T06:12:46 1762668766

Anecdotally, a Max subscriber gets something like $100 worth of usage per day. The more people use Claude Code, the more Anthropic loses, so it sounds like a classical "selling a dollar for 85 cents" business to me.

As soon as users are confronted with their true API cost, the appearance of this being a good business falls apart. At the end of the day, there is no moat around large language models - OpenAI, Anthropic, Google, DeepSeek, Alibaba, Moonshot... any company can make a SOTA model if they wish, so in the long run it's guaranteed to be a race to the bottom where nobody can turn a profit.

simonw · 2025-11-09T06:16:03 1762668963

> Anecdotally, a Max subscriber gets something like $100 worth of usage per day.

Where are you getting that number from?

Anthropic added quite strict limits on usage - visible from the /usage method inside Claude Code. I would be surprised if those limits turn out to still result in expensive losses for them.

crazylogger · 2025-11-09T06:51:54 1762671114

This is just personal experience + reddit anecdotes. I've been using CC from day one (when API pricing was the only way to pay for CC), then I've been on the $20 Pro plan and am getting a solid $5+ worth of usage in each 5h session, times 5-10 sessions per week (so an overall 5-10x subsidy over one month.) And I extrapolated that $200 subscribers must be getting roughly 10x Pro's usage. I do feel the actual limit fluctuates each week as Claude Code engage in this new subsidy war with OAI Codex though.

My theory is this:

- we know from benchmarks that open-weight models like Deepseek R1 and Kimi K2's capabilities are not far behind SOTA GPT/Claude

- open-weight API pricing (e.g. on openrouter) is roughly 1/10~1/5 that of GPT/Claude

- users can more or less choose to hook their agent CLI/IDEs to either closed or open models

If these points are true, then the only reason people are primarily on CC & Codex plans is because they are subsidized by at least 5~10x. When confronted with true costs, users will quickly switch to the lowest inference cost vendor, and we get perfect competition + zero margin for all vendors.

wahnfrieden · 2025-11-09T08:07:17 1762675637

The benchmarks lie. Go try coding full-time with R1 vs Codex or GPT-5 (in Codex). The latter is firmly preferred even by those who have no issue with budgeting tokens for their productivity.

crazylogger · 2025-08-29T18:00:45 1756490445

https://pure.md is exactly what you're looking for.

But stripping complex formats like html & pdf down to simple markdown is a hard problem. It's nearly impossible to infer what the rendered page looks like by looking at the raw html / pdf code. https://github.com/mozilla/readability helps but it often breaks down over unconventional div structures. I heard the state of the art solution is using multimodal LLM OCR to really look at the rendered page and rewrite the thing in markdown.

Which makes me wonder: how did OpenAI make their model read pdf, docx and images at all?

crazylogger · 2025-08-24T03:49:58 1756007398

I think the OP's point is that all those requirements are to be implemented outside the LLM layer, i.e. we don't need to conceive of any new model architecture. Even if LLMs don't progress any further beyond GPT-5 & Claude 4, we'll still get there.

Take memory for example: give LLM a persistent computer and ask it to jot down its long-term memory as hierarchical directories of markdown documents. Recalling a piece of memory means a bunch of `tree` and `grep` commands. It's very, very rudimentary, but it kinda works, today. We just have to think of incrementally smarter ways to query & maintain this type of memory repo, which is a pure engineering problem.

root_axis · 2025-08-24T04:27:27 1756009647

The answer can't be as simple as more sophisticated RAGs. At the end of the day, stuffing the context full of crap can only take you so far because context is an extremely limited resource. We also know that large context windows degrade in quality because the model has a harder time tracking what the user wants it to pay attention to.

crazylogger · 2025-08-20T03:11:46 1755659506

We have CONTRIBUTING.md for that. Seems to me the author just doesn't know about it?

crazylogger · 2025-08-16T05:38:33 1755322713

Today’s AI systems probably won’t excel, but they won’t completely fail either.

Basically give the LLM a computer to do all kinds of stuff against the real world, kick it off with a high level goal like “build a startup”.

The key is to instruct it to manage its own memory in its computer, and when context limit inevitably approaches, programmatically interrupt the LLM loop and instruct it to jot down everything it has for its future self.

It already kinda works today, and I believe AI systems a year from now will excel at this:

https://dwyer.co.za/static/claude-code-is-all-you-need.html

https://www.anthropic.com/research/project-vend-1