This appears to be a part of a crackdown on third-party clients using Claude Code's credentials/subscriptions but not through Claude Code.
Not surprising as this type of credential reuse is always a gray area, but weird Anthropic deployed it on a Thursday night without any warning as the inevitable shitstorm would be very predictable.
Are they really that strapped already? It took Netflix like 20 years before they began nickel and diming us.. with Anthro it's starting after less than 20 months in the spotlight.
I suspect it's really about control and the culture of Anthropic, rather than only finances. The message is: no more funtime, use Claude CLI, pay a lot for API tokens, or get your account banned.
It isn't that simple, demand is growing and they're investing in that growth. With the exception of ElGoog the providers are all private entities, so we don't really know.
They've added this change at the same time they added random trick prompts to try and get you hit enter on the training opt in from late last year. I've gotten three popups inside claude code today at random times trying to trick me into having it train my data with a different selection defaulted than I've already chosen.
More evidence the EU solved the wrong problem. Instead of mandating cookie banners, mandate a single global “fuck off” switch: one-click, automatic opt-out from any feature/setting/telemetry/tracking/training that isn’t strictly required or clearly beneficial to the user as an individual. If it’s mainly there for data collection, ads, attribution, “product improvement”, or monetization, it should be off by default and remain that way so long as the “fuck off” option is toggled. Burden of proof on the provider. Fines exceeding what it takes to get growth teams and KPI hounds to have legal coach them on what “fuck off” means and why they need to.
DNT was useless because it didn't have a legal basis. It would have been amazing if they had mandated something like this instead of the cookie walls.
Advertisers ignored it because they could. And complained that it defaulted to on, however cookies are supposed to be opt-in so this is how it's supposed to work anyway.
remember how all of HN and tech people were saying that DNT is a Micro$oft scam designed to break privacy because it was enabled by default without requiring user action?
to the point that Apache web server developers added a custom rule in the default httpd.conf to strip away incoming DNT headers !!!
I suspect 99% of coding agents would be able to say "hey wait, there's no 'index_value' column, here's the correct input.":
df['new_column'] = df.index + 1
The original bug sounds like a GPT-2 level hallucination IMO. The index field has been accessible in pandas since the beginning and even bad code wouldn't try an 'index_value' column.
My thought process, if someone handed me this code and asked me to fix it, would be that they probably didn’t expect
df[‘index_value’]
to hold
df.index
Just because, well, how’d the code get into this state? ‘index_value’ must have been a column that held something, having it just be equal to df.index seems unlikely because as you mention that’s always been available. I should probably check the change history to figure out when ‘index_value’ was removed. Or ask the person about what that column meant, but we can’t do that if we want to obey the prompt.
The model (and you) have inferred completely without context that index_value is meant to somehow map to the dataframe index. What if this is raw .csv data from another system. I work with .csv files from financial indices - index_value (or sometimes index_level) confers completely different meaning in this case.
This inference is not at all "without context". It's based on the meaning of "index", and the contextual assumption that reasonable people put things into CSV columns whose intended purpose aligns with the semantic content of the column's title.
That is a fair counterpoint, but if that were the case, there would always be more context accessible, e.g. the agent could do a `df.head()` to get an overview of the data and columns (which would indicate financial indices) or there would be code after that which would give strong signal that the intent is financial indices and not the DataFrame index.
This is why vague examples in blog posts aren't great.
The models are gotten very good, but I rather have an obviously broken pile of crap that I can spot immediately, than something that is deep fried with RL to always succeed, but has subtle problems that someone will lgtm :( I guess its not much different with human written code, but the models seem to have weirdly inhuman failures - like, you would just skim some code, cause you just cant believe that anyone can do it wrong, and it turns out to be.
Well, for some reason it doesnt let me respond to the child comments :(
The problem (which should be obvious) is that with a/b real you cant construct an exhaustive input/output set. The test case can just prove the presence of a bug, but not its absence.
Another category of problems that you cant just test and have to prove is concurrency problems.
Of course you can. You can write test cases for anything.
Even an add_numbers function can have bugs, e.g. you have to ensure the inputs are numbers. Most coding agents would catch this in loosely-typed languages.
The article uses pandas as a demo example for LLM failures, but for some reason, even the latest LLMs are bad at data science code which is extremely counterintuitive. Opus 4.5 can write a EDA backbone but it's often too verbose for code that's intended for a Jupyter Notebook.
The issues have been less egregious than hallucinating an "index_value" column, though, so I'm suspect. Opus 4.5 still has been useful for data preprocessing, especially in cases where the input data is poorly structured/JSON.
This is not my experience. Claude Code has been fine for data science for a while. It has many issues and someone at the wheel who knows what they're doing is very much required, but for many common cases I'm not writing code by hand anymore, especially when the code would have been throwaway anyway. I'd be extremely surprised if a frontier model doesn't immediately get the problem the author is pointing out.
As long as the liability precedents set by prior case law and current regulations hold, there should be no problem. OpenAI and the hordes of lawyers working for and with them will have ensured that every appropriate and legally required step has been taken, and at least for now, these are software tools used by individuals. AI is not an agent of itself or the platform hosting it; the user's relative level of awareness of this fact shouldn't be legally relevant as long as OpenAI doesn't make any claims to the contrary.
You also have to imagine that they've got their zero guardrails superpowered internal only next generation bot available to them, which can be used by said lawyer horde to ensure their asses are thoroughly covered. (It'd be staggeringly stupid not to use their AI for things like this.)
The institutions that have artificially capped levels of doctors, strangled and manipulated healthcare for personal gain, allowed insurance and health industries to become cancerous - they should be terrified of what's coming. Tools like this will be able to assist people with deep, nuanced understanding of their healthcare and be a force multiplier for doctors and nurses, of which there are far too few.
It'll also be WebMD on steroids, and every third person will likely be convinced they have stereochromatic belly button cancer after each chat, but I think we'll be better off, anyway.
Neither have I, personally, but I’ve seen reports this can happen on very hard problems, where the goal just cannot be reached from a local optimum. Getting unstuck by trying something new is something a watchdog agent could prompt it.
You are attempting to move the goalposts. There are two different points in this debate:
1) Modern LLMs are an inflection point for coding.
2) The current LLM ecosystem is unsustainable.
This submission discussion is only about #1, which #2 does not invalidate. Even if the ecosystem crashes, then open-source LLMs that leverage the same tricks Opus 4.5 does will just be used instead.
But it's only an inflection point if it's sustainable. When this comes crashing down, how many people are going to be buying $70k GPUs to run an open source model?
I said open-source models, not locally-hosted models. Essentially, more power to inference-only providers such as Groq and Together AI which host the large-scale OSS LLMs who will be less affected by a crash as long as the demand for coding agents is there.
Ok, and then? Taking a one time discount on a rapidly depreciating asset doesn’t magically make this whole industry profitable, and it’s not like you’re going to start running a GB200 in your basement.
Checked your history. From a fellow skeptic, I know how hard it is to reason with people around here. You and I need to learn to let it go. In the end, the people at the top have set this up so that either way, they win. And we're down here telling the people at our level to stop feeding the monster, but told to fuck off anyways.
So cool bro, you managed to ship a useless (except for your specific use-case) app to your iphone in an hour :O
What I think this is doing is it's pitting people against the fact that most jobs in the modern economy (mine included btw) are devoid of purpose. This is something that, as a person on the far left, I've understood for a long time. However, a lot (and I mean a loooooot) of people have never even considered this. So when they find that an AI agent is able to do THEIR job for them in a fraction of the time, they MUST understand it as the AI being some finality to human ingenuity and progress given the self-importance they've attributed to themselves and their occupation - all this instead of realizing that, you know, all of our jobs are useless, we all do the exact same useless shit which is extremely easy to replicate quickly (except for a select few occupations) and that's it.
I'm sorry to tell anyone who's reading this with a differing opinion, but if AI agents have proven revolutionary to your job, you produced nothing of actual value for the world before their advent, and still don't. I say this, again, as someone who beyond their PhD thesis (and even then) does not produce anything of value to the world, while being paid handsomely for it.
> if AI agents have proven revolutionary to your job, you produced nothing of actual value for the world before their advent, and still don't.
This doesn’t logically follow. AI agents produce loads of value. Cotton picking was and still is useful. The cotton gin didn’t replace useless work. It replaced useful work. Same with agents.
> I'm sorry to tell anyone who's reading this with a differing opinion, but if AI agents have proven revolutionary to your job, you produced nothing of actual value for the world before their advent, and still don't.
I agree with this, but I think my take on it is a lot less nihilistic than yours. I think people vastly undersell how much effort they put into doing something, even if that something is vibecoding a slop app that probably exists. But if people are literally prompting claude with a few sentences and getting revolutionary results, then yes, their job was meaningless and they should find something to do that they’re better at.
But what frustrates me the most about this whole hype wave isn’t just that the powers that be have bet the entire economy on a fake technology, it’s that it’s sucking all of of the air out of the room. I think most people’s jobs can actually provide value and there’s so much work to be done to make _real_ progress. But instead of actually improving the world, all the time, money, and energy is being thrown into such a wasteful technology that is actively making the world a worse place. I’m sure it’s always been like this and I was just to naive too see it, but I much preferred it when at least the tech companies pretended they cared about the impact their products had on society rather than simply trying to extract the most value out of the same 5 ideas.
Yeah, I do tend to have a rather nihilistic view on things, so apologies.
I really think we're just cooked at this point. The amount of people (some great friends whom I respect) that have told me in casual conversation that if their LLM were taken from them tomorrow, they wouldn't know how to do their work (or some flavour of that statement) has made me realize how deep the problem is.
We could go on and on about this, but let's both agree to try and look inward more and attempt to keep our own things in order, while most other people get hooked on the absolute slop machine that is AI. Eventually, the LLM providers will need to start ramping up the costs of their subscriptions and maybe then will people start clicking that the shitty code that was generated for their pointless/useless app is not worth the actual cost of inference (which some conservative estimates put out to thousands of dollars per month on a subscription basis). For now, people are just putting their heads in the sand and assuming that physicists will somehow find a way to use quantum computers to speed up inference by a factor of 10^20 in the next years, while simultaneously slashing its costs (lol).
But hey, Opus 4.5 can cook up a functional app that goes into your emails and retrieves all outstanding orders - revolutionary. Definitely worth the many kWh and thousands of liters of water required, eh?
The studies focus on a single representative task, but in a thread about coding entire apps in hours as opposed to weeks, you can imagine the multiples involved in terms of resource conservation.
The upshot is, generating and deploying a working app that automates a bespoke, boring email workflow will be way, way, wayyyyy more efficient than the human manually doing that workflow everytime.
I want to push back on this argument, as it seems suspect given that none of these tools are creating profit, and so require funds / resources that are essentially coming from the combined efforts of much of the economy. I.e. the energy externalities here are monstrous and never factored into these things, even though these models could never have gotten off the ground if not for the massive energy expenditures that were (and continue to be) needed to sustain the funding for these things.
To simplify, LLMs haven't clearly created the value they have promised, but have eaten up massive amounts of capital / value produced by everyone else. But producing that capital had energy costs too. Whether or not all this AI stuff ends up being more energy efficient than people needs to be measured on whether AI actually delivers on its promises and recoups the investments.
EDIT: I.e. it is wildly unclear at this point that if we all pivot to AI that, economy-wide, we will produce value at a lower energy cost, and, even if we grant that this will eventually happen, it is not clear how long that will take. And sure, humans have these costs too, but humans have a sort of guaranteed potential future value, whereas the value of AI is speculative. So comparing energy costs of the two at this frozen moment in time just doesn't quite feel right to me.
These tools may not be turning a profit yet, but as many point out, this is simply due to deeply subsidized free usage to capture market share and discover new use cases.
However, their economic potential is undeniable. Just taking the examples in TFA and this sub-thread, the author was able to create economic value by automating rote aspects of his wife's business and stop paying for existing subscriptions to other apps. TFA doesn't mention what he paid for these tokens, but over the lifetime of his apps I'd bet he captures way more value than the tokens would have cost him.
As for the energy externalities, the ACM article puts some numbers on them. While acknowledging that this is an apples/oranges comparison, it points out that the training cost for GPT-3 (article is from mid-2024) is about 5x the cost of raising a human to adulthood.
Even if you 10x that for GPT-5, that is still only the cost of raising 50 humans to adulthood in exchange for a model that encapsulates a huge chunk of the world's knowledge, which can then be scaled out to an infinite number of tasks, each consuming a tiny fraction of the resources of a human equivalent.
As such, even accounting for training costs, these models are far more efficient than humans for the tasks they do.
I appreciate your responses to my comments, including the addition of reading material. However, I'm going to have to push back on both points.
Firstly, saying that because AI water use is on par with other industries, then we shouldn't scrutinize AI water use is a bit short-sighted. If the future Altman et al want comes to be, the shear scale of deployment of AI-focused data centers will lead to nominal water use orders of magnitude larger than other industries. Of course, on a relative scale, they can be seen as 'efficient', but even something efficient, when built out to massive scale, can suck out all of our resources. It's not AI's fault that water is a limited resource on Earth; AI is not the first industry to use a ton of water; however, eventually, with all other industries + AI combined (again, imagining the future the AI Kings want), we are definitely going 300km/h on the road to worldwide water scarcity. We are currently at a time where we need to seriously rethink our relationship with water as a society - not at a time where we can spawn whole new, extremely consumptive industries (even if, in relative terms, they're on par with what we've been doing (which isn't saying much given the state of the climate)) whose upsides are still fairly debatable and not at all proven beyond a doubt.
As for the second link, there's a pretty easy rebuke to the idea, which aligns with the other reply to your link. Sure, LLMs are more energy-efficient at generating text than human beings, but do LLMs actually create new ideas? Write new things? Any text written by an LLM will be based off of someone else's work. There is a cost to creativity - to giving birth to actual ideas - that LLMs will never be able to incur, which makes them seem more efficient, but in the end they're more efficient at (once again) tasks which us humans have provided them with plenty of examples of (like writing corporate emails! Or fairly cookie-cutter code!) but at some point the value creation is limited.
I know you disagree with me, it's ok - you are in the majority and you can feel good about that.
I honestly hope the future you foresee where LLMs solve our problems and become important building blocks to our society comes to fruition (rather than the financialized speculation tools they currently are, let's be real). If that happens, I'll be glad I was wrong.
These are important conversations to have because there is so much hyperbole in both directions that a lot of people end up having strong but misguided opinions. I think it's very helpful to consider the impact of LLMs in context (heheh) of the bigger picture rather than in isolation, because suddenly a lot of things fall into perspective.
For instance, all water use by data centers is a fraction of the water used by golf courses! If it really does comes down to the wire for conserving water, I think humanity has the option of foregoing a leisure activity for the relatively wealthy in exchange for accelerated productivity for the rest of the world.
And totally, LLMs might not be able to come up with new ideas, but they can super-charge the humans who do have ideas and want to develop them! An idea that would have taken months to be explored and developed can now be done in days. And given that like the majority of ideas fail, we would be failing that much faster too!
In either case, just eyeballing the numbers we have currently, on average the resources a human without AI assistance would have consumed to conclude an endeavor far outweighs the resources consumed by both that human and an assisting LLM.
I would agree that there will likely be significant problems caused by widespread adoption of AI, but at this point I think they would social (e.g. significant job displacement, even more wealth inequality) rather than environmental.
> I want to push back on this argument, as it seems suspect given that none of these tools are creating profit, and so require funds / resources that are essentially coming from the combined efforts of much of the economy. I.e. the energy externalities here are monstrous and never factored into these things, even though these models could never have gotten off the ground if not for the massive energy expenditures that were (and continue to be) needed to sustain the funding for these things.
While it is absolutely possible, even plausible, that the economics of these models and providers is the next economic crash in waiting, somewhere between Enron (at worst, if they're knowingly cooking books) or Global Financial Crisis (if they're self-delusional rather than actively dishonest), we do have open-weights models that get hosted for money, that people play with locally if they're rich enough for the beefy machines, and that are not too far behind the SOTA as to suggest a difference in kind.
This all strongly suggests that the resource consumption per token by e.g. Claude Code would be reasonably close to the list price if they weren't all doing a Red Queen race[0], running as hard as they can just to retain relevant against each other's progress, in an all-pay auction[1] where only the best can ever hope to cash anything out and even that may never be enough to cover the spend.
Thing is, automation has basically always done this. It's more of a question of "what tasks can automation actually do well enough to bother with?" rather than "when it can, is it more energy efficient than a human?"
A Raspberry Pi Zero can do basic arithmetic faster than the sum total performance of all 8 billion living humans, even if all the humans had trained hard and reached the level of the current world record holder, for a tenth of the power consumption of just one of those human's brains, or 2% of their whole body. But that's just arithmetic. Stable Diffusion 1.5 had a similar thing, when it came out the energy cost to make a picture on my laptop was comparable with the calories consumed while typing in a prompt for it… but who cares, SD 1.5 had all that Cronenberg anatomy, what matters is when the AI is "good enough" for the tasks against which it is set.
To the extent that Claude Code can replace a human, and the speed at which it operates…
Well, my experiments just before Christmas (which are limited, and IMO flawed in a way likely to overstate the current quality of the AI) say the speed of the $20 plan is about 10 sprints per calendar month, while the quality is now at the level of a junior with 1-3 years experience who is just about to stop being a junior. This means the energy cost per unit of work done is comparable with the energy cost needed to have that developer keep a computer and monitor switched on long enough to do the same unit of work. The developer's own body adds another 100-120 watts to that from biology, even if they're a free-range hippie communist who doesn't believe in money, cooked food, lightbulbs, nor having a computer or refrigerator at home, and who commutes by foot from a yurt with neither AC nor heating, ditto the office.
Where the AI isn't good enough to replace a human, (playing Pokemon and managing businesses?) it's essentially infinitely more expensive (kWh or $) to use the AI.
Still, this does leave a similar argument as with aircraft: really efficient per passenger-kilometre, but they enable so many more passenger-kilometres than before as to still sum to a relevant problem.
> For now, people are just putting their heads in the sand and assuming that physicists will somehow find a way to use quantum computers to speed up inference by a factor of 10^20 in the next years, while simultaneously slashing its costs (lol).
GPT-3 Da Vinci cost $20/million tokens for both input and output.
GPT-5.2 is $1.75/million for input and $14/million for output
I'd call that pretty strong evidence that they've been able to dramatically increase quality while slashing costs, over just the past ~4 years.
Isn't that kind of related with the amount of money thrown at the field? If the economy gets worse for any reason, do you think that we can still expect these level of cutting costs in the future?
> But hey, Opus 4.5 can cook up a functional app that goes into your emails and retrieves all outstanding orders - revolutionary. Definitely worth the many kWh and thousands of liters of water required, eh?
The thing is in a vacuum this stuff is actually kinda cool. But hundreds of billions in debt-financed capex that will never seen a return, and this is the best we’ve got? Absolutely cooked indeed.
Not surprising as this type of credential reuse is always a gray area, but weird Anthropic deployed it on a Thursday night without any warning as the inevitable shitstorm would be very predictable.
reply