Sounds very plausible to me too. Because even if you refocus the business unit it makes no sense to lay off a highly capable team. Finding new people, integrating them into the team - all that costs a lot of time and money and there is no guarantee for success.
Definitely plenty of people further up the corporate ladder were not happy with the success, while the top is likely too far disconnected to understand.
Cynical or not, I think it was an absolutely brilliant move: "Mass domestic surveillance of Americans constitutes a violation of fundamental rights". I think they placed their bets on Sama signing a contract with the DoD and here we are, one day later the news that OpenAI signed a contract is out. An absolute PR disaster for OpenAI. And an absolute PR victory for Anthropic.
I think OpenAI's IPO will be interesting. Not even the conservative media will be happy about mass surveillance of Americans.
For non-Americans not much change, they don't really care about your rights more than about a pile of dog poo.
Exactly. At this level you don't just put out a statement of your personal opinion. This is run through PR and coordinated with the investors. Otherwise the CEO finds himself on the street by tomorrow. Whatever their motives are, it is aligned with VC, because if it is not then the next day there is another CEO. As the parent stated, this is not cynicism. I see this just rather factual, it is simply the laws of money.
This. This is a public misdirection. They already signed a new deal. It may be to their disliking but nothing in the statement prevents them from moving forward.
Isn't that what we're all doing in this thread? We could certainly take the document at face value but as a parent commenter said, almost every company starts off with "don't be evil" then goes and does evil things.
Is anthropic different? Maybe. But personally I don't see any indication to give them the benefit of the doubt.
There are outcomes where the US government seizes the company. Not super likely, not impossible.
It would be naive to write a statement that a future event will never happen, under any circumstances. People who make that mistake get lambasted for hypocrisy when unforeseen circumstances arise.
I see recognition that making absolute statements about the future is best left to zealots and prophets. Which to me speaks of maturity, not duplicity.
It is indeed a naive, or more likely a dishonest thing to do.
Anyone can promise anything. When there's little to no accountability and public memory/opinion doesn't last a week (or is easily manipulated anyway), then promises mean literally nothing. Very like how, in politics, temporary means permanent.
Or HackerNews itself, with them implementing a little Big Brother. It will, of course, absolutely and without a doubt only "nudge" people and it will absolutely, under no circumtances, pinky promise, never get any worse or do anything else but that.
When there's millions of fools, then those, who actually recognize that they are being fooled, are rarely ever significant in numbers. They're drowned out by the fools, until said fools "wake up" and cry "if only we had known!".
Well ... you could have known, but in your mindlessness you didn't listen and think.
"It must be true, because they say so. D'uh. What are you, dumb?"
I'm not saying whether or not they're planning to back down, but this sentence doesn't imply that. The "now" is clearly meant to be in reference to the fact they've not in the past.
Being a tech forum centered around VC funding means we have a TON of tech bros (derogatory) here, who believe in nothing beyond getting their own piles of money for doing literally anything they can be paid to do. If you offered these guys $20 to murder a grandmother they'd ask if they have to cover the cost of the murder weapon or if that's provided.
I get it to a degree, people gotta eat, and especially right now the market is awful and, not to mention, most hyperscaler businesses have been psychologically obliterating people for a decade or more at this point. Why not graduate to doing it with weapons of war too? But, personally, I sleep better at night knowing nothing I've made is helping guide missiles into school busses but that's just me.
In general - I don’t know if it’s a coincidence but here on HN for example, I’ve noticed an increasing amount of comments and posts emphasizing the narrative of how “well- intended” Anthropic is.
I'd love to see the financial model that offsets losing your single biggest customer and substantial chunk of your annual revenue with some vague notion of public trust.
This is so short sighted. We are so early into this AI revolution, and this administration is obviously in a tailspin, with the only folk left in charge being the least capable ones we have seen in a decade
Imagine what the conversation would be like if Mattis, a highly decorated and respected leader were still the SecDef. Instead we are seeing bully tactics from a failed cable news pundit who has neither earned nor deserved any respect from the military he represents.
We are two elections and a major health issue away from a complete change of course.
But short sightedness is the name of the quarterly reporting game, so who knows.
Not trying to be the Luddite. Had multiple questions to AI tools yesterday, and let Claude/Zed do some boilerplate code/pattern rewriting.
I’ve worked in software for 35 years. I’ve seen many new “disruptive” movements come and go (open source, objects, functional, services, containers, aspects, blockchains, etc). I chose to participate in some and not in others. And whether I made the wrong choices or not, I always felt like I could get a clear enough picture of where the bandwagon was going that I could jump in, or hold back, or kind of. My choices weren’t always the same as others, so it’s not like it was obvious to everyone. But the signal felt more deterministic.
With LLM/agents, I find I feel the most unease and uncertainty with how much to lean in, and in what ways to lean in, than I ever have before. A sort of enthusiasm paralysis that is new.
> I'm seriously worried there won't be more elections. Not hyperbole at all.
Why? That's a an unrealistic fear, driven by the insanely overwrought political rhetoric of 2026. Think about it: elections will be the absolute last thing to go.
If you want something to worry about, worry about this:
> And the stakes of politics are almost always incredibly high. I think they happen to be higher now. And I do think a lot of what is happening in terms of the structure of the system itself is dangerous. I think that the hour is late in many ways. My view is that a lot of people who embrace alarm don’t embrace what I think obviously follows from that alarm, which is the willingness to make strategic and political decisions you find personally discomfiting, even though they are obviously more likely to help you win.
> Taking political positions that’ll make it more likely to win Senate seats in Kansas and Ohio and Missouri. Trying to open your coalition to people you didn’t want it open to before. Running pro-life Democrats.
> And one of my biggest frustrations with many people whose politics I otherwise share is the unwillingness to match the seriousness of your politics to the seriousness of your alarm. I see a Democratic Party that often just wants to do nothing differently, even though it is failing — failing in the most obvious and consequential ways it can possibly fail. (https://www.nytimes.com/2025/09/18/opinion/interesting-times...)
It's not an unrealistic fear. Trump has been making noises about "taking over elections." Abolishing elections wholesale is very unlikely, sure, but a sham election rigged by a corrupt government? That's standard fare for authoritarians. And there's evidence of voting anomalies in swing states in the 2024 election.
FYI, even though you have a new account, you were banned from your first comment and all your comments automatically show up as hidden-by-default to most users.
I don't think it's crazy to worry that, but elections are run by the states, there are over 100,000 poling places nationally, and people are pissed. On Jan 3, the entire current House of Representatives terms end; Democratic governors will still hold elections, and if there haven't been elections in GOP-led states, they're out of representation. There are so many hurdles in the way of the fascists canceling or heavily interfering in elections, and they're all just so stupid.
If you think they're pissed now, just wait to see how they react to election interference.
I recently read up on how the House of Representatives renews itself and quite frankly it's one of the most beautiful processes I've seen, completely removing the influence of the prior congress.
Their whole strategy is that the lack of a legal moat protecting their product is an existential threat to human life. They are the only moral AI and their competitors must be sanctioned and outlawed. At which point they can transition from AI as commodity to “value” based pricing.
It’s not going to work, but I can’t blame Amodei and friends for trying to make themselves trillionaires.
I'd love to see any evidence that this single biggest customer is provably and irreversibly lost on all levels of scrutiny as a result of this attempt at building public trust.
You're implying cancelling quietly would be better. But the department would just use a different supplier. This seems like the action someone would take if they cared about the issue.
> If you do not like working with the military, ...
Eh? But they do like to work with the military. How else are you going to "defend the United States and other democracies, and to defeat our autocratic adversaries"?
They want to work with the military, with just two additional guardrails.
> “Laws are a threat made by the dominant socioeconomic ethnic group in a given nation. It’s just the promise of violence that’s enacted, and the police are basically an occupying army, you know what I mean?”
...Which is funny, but technically speaking, it's (more or less) a paraphrasing/extrapolation of the very serious political science definition of a state, “a monopoly over the legitimate use of violence in a defined territory”
[1] Minus the last line, which I will allow others to discover for themselves
That's maybe the second law. The first one is: money is always finite.
Look at how Elon Musk behaved. Do you think VC gladly approved what he did with Twitter? They might want to keep chasing quarterly results - but sometimes, like with Zukerberg, they can't. Not enough money. Similar examples with Google rounds or how much more financially backed politician loses rather often to a competitor. Or, if you will, Vladimir Putin's idea that he can buy whatever results he wants - and that guy is a very wealthy person. There are always limits, putting the money law to the second place. We might argue that often the existing money is enough... but in more geopolitical, continuum-curving cases there are other powerful forces.
The Twitter acquisition wasn't funded by venture capital, so your question about VC approval doesn't apply.
If you're using VC as a general term for "investor" (inaccurately), then the answer to your question is that the major investors, such as Larry Ellison and the Saudi monarchy, wanted political control of Twitter, which meant that they did (apparently) approve what Musk did with it.
You're missing the point. It matters little where exactly money to pay for acquisition of Twitter came from. What matters is that nobody expected Twitter to lose employees and users in such numbers. So, whoever gave the money, was still limited in ensuring the results are "fully enough" in line with their wishes. Because money is always finite.
FWIW, I don’t actually know if board of Anthropic has actual power to replace its CEO or if Dario has retained some form of personal super-control shares Zuckerberg style.
At some level of growth, the dynamics between competent founders and shareholders flip. Even if the board could afford to replace a CEO, it might not be worth it.
I'd counter that at this level of capital, if the CEO doesn't well align with the capital, then super-control shares will be overpowered by super-lawyers and if there is need some super-donations. OpenAI was a public interest company...
Not at all. Especially at that level of capital. It’s the equity equivalent of „if you owe a bank a million dollars, you’re in trouble. If you owe a bank a billion dollars, the bank is in trouble”.
Capital is extremely fungible. Typically extremely overleveraged. Lawyers are on the other hand extremely overprotective. They won’t generally risk the destruction of capital, even in slam-dunk cases. Vide WeWork.
Anthropic has an odd voting structure. While the CEO Dario Amodei holds no super-voting shares, there are special shares controlled by a separate council of trustees who aren't answerable to investors and who have the power to replace the Board. So in practice it comes down to personal relationships.
Surely you mean the laws of shareholder capitalism. There are many things you can do with money, and only some of them are legally backed by rules that ensure absolute shareholder power.
Whenever you see a Youtube video from a restaurant kitchen you can almost be sure to see some pans where the teflon has been scrubbed down to the pure metal. Probably not that healthy...
PTFE is extremely chemically inert. There's possibly some risk from ultra-fine particulates that could be absorbed by the body, but compared to all the other sources of particulate exposure I don't think it's a major problem. I'm more concerned by thermal decomposition, which forms all kinds of mystery chemicals of unknown risk profile. Restaurants love to sear food and I wouldn't be surprised if some of them do it on PTFE cookwear.
its highly unlikely any restaurant uses teflon coated pans. Most use carbon steel or stainless. Teflon just won't hold up in daily use at a restaurant.
320 tok/s PP and 42 tok/s TG with 4bit quant and MLX. Llama.cpp was half for this model but afaik has improved a few days ago, I haven't yet tested though.
I have tried many tools locally and was never really happy with any. I tried finally Qwen Code CLI assuming that it would run well with a Qwen model and it does. YMMV, I mostly do javascript and Python. Most important setting was to set the max context size, it then auto compacts before reaching it. I run with 65536 but may raise this a bit.
Last not least OpenCode is VC funded, at some point they will have to make money while Gemini CLI / Qwen CLI are not the primary products of the companies but definitely dog-fooded.
Works for me, but sometimes there's an issue with the tool template from Qwen, past chats are changed, thus KV cache gets invalidated and it needs to reprocess input tokens from scratch. Doesn't happen all the time though
Btw I also get 42-60 tps on M4 Max with the MLX 4 bit quants hosted by LM Studio, which software do you use to run it ?
I use MLX server directly from the MLX community project (by Apple). 42 tps is with 0-5000 token context. Starts to drop from there, I have never seen 60.
Yesterday I tested the latest llama.cpp and the result is that PP has made a huge jump to 420 tps which is 30% faster than MLX on my M1. TG is now 25 tps which is below MLX but does not degrade much, at 50k context it is still 22-23 tps.
Together with Qwen code CLI llama.cpp does a lot less often re-process the full KV cache. So for now I am switching back to llama.cpp.
It is worth to spend some time with the settings. I am really annoyed by the silly jokes (was it Claude that started this?). You can disable them with customWittyPhrases. Also setting contextWindowSize will make the CLI auto compress, which works really well for me.
And depending on what you do, maybe set privacy.usageStatisticsEnabled to false.
Like Gemini, Qwen CLI supports OpenTelemetry. When I have time I'll have a look why the KV cache gets invalidated.
Great thanks ! I am so annoyed by a specific phrase which is "launching wit.exe", not funny when it could actually be talking for real about software running on your machine
This is not a general purpose chip but specialized for high speed, low latency inference with small context. But it is potentially a lot cheaper than Nvidia for those purposes.
Tech summary:
- 15k tok/sec on 8B dense 3bit quant (llama 3.1)
- limited KV cache
- 880mm^2 die, TSMC 6nm, 53B transistors
- presumably 200W per chip
- 20x cheaper to produce
- 10x less energy per token for inference
- max context size: flexible
- mid-sized thinking model upcoming this spring on same hardware
- next hardware supposed to be FP4
- a frontier LLM planned within twelve months
This is all from their website, I am not affiliated. The founders have 25 years of career across AMD, Nvidia and others, $200M VC so far.
Certainly interesting for very low latency applications which need < 10k tokens context. If they deliver in spring, they will likely be flooded with VC money.
Not exactly a competitor for Nvidia but probably for 5-10% of the market.
Back of napkin, the cost for 1mm^2 of 6nm wafer is ~$0.20. So 1B parameters need about $20 of die. The larger the die size, the lower the yield. Supposedly the inference speed remains almost the same with larger models.
This math is useful. Lots of folks scoffing in the comments below. I have a couple reactions, after chatting with it:
1) 16k tokens / second is really stunningly fast. There’s an old saying about any factor of 10 being a new science / new product category, etc. This is a new product category in my mind, or it could be. It would be incredibly useful for voice agent applications, realtime loops, realtime video generation, .. etc.
2) https://nvidia.github.io/TensorRT-LLM/blogs/H200launch.html Has H200 doing 12k tokens/second on llama 2 12b fb8. Knowing these architectures that’s likely a 100+ ish batched run, meaning time to first token is almost certainly slower than taalas. Probably much slower, since Taalas is like milliseconds.
3) Jensen has these pareto curve graphs — for a certain amount of energy and a certain chip architecture, choose your point on the curve to trade off throughput vs latency. My quick math is that these probably do not shift the curve. The 6nm process vs 4nm process is likely 30-40% bigger, draws that much more power, etc; if we look at the numbers they give and extrapolate to an fp8 model (slower), smaller geometry (30% faster and lower power) and compare 16k tokens/second for taalas to 12k tokens/s for an h200, these chips are in the same ballpark curve.
However, I don’t think the H200 can reach into this part of the curve, and that does make these somewhat interesting. In fact even if you had a full datacenter of H200s already running your model, you’d probably buy a bunch of these to do speculative decoding - it’s an amazing use case for them; speculative decoding relies on smaller distillations or quants to get the first N tokens sorted, only when the big model and small model diverge do you infer on the big model.
Upshot - I think these will sell, even on 6nm process, and the first thing I’d sell them to do is speculative decoding for bread and butter frontier models. The thing that I’m really very skeptical of is the 2 month turnaround. To get leading edge geometry turned around on arbitrary 2 month schedules is .. ambitious. Hopeful. We could use other words as well.
I hope these guys make it! I bet the v3 of these chips will be serving some bread and butter API requests, which will be awesome.
> any factor of 10 being a new science / new product category,
I often remind people two orders of quantitative change is a qualitative change.
> The thing that I’m really very skeptical of is the 2 month turnaround. To get leading edge geometry turned around on arbitrary 2 month schedules is .. ambitious. Hopeful. We could use other words as well.
The real product they have is automation. They figured out a way to compile a large model into a circuit. That's, in itself, pretty impressive. If they can do this, they can also compile models to an HDL and deploy them to large FPGA simulators for quick validation. If we see models maturing at a "good enough" state, even a longer turnaround between model release and silicon makes sense.
While I also see lots of these systems running standalone, I think they'll really shine combined with more flexible inference engines, running the unchanging parts of the model while the coupled inference engine deals with whatever is too new to have been baked into silicon.
I'm concerned with the environmental impact. Chip manufacture is not very clean and these chips will need to be swapped out and replaced at a cadence higher than we currently do with GPUs.
Having dabbled in VLSI in the early-2010s, half the battle is getting a manufacturing slot with TSMC. It’s a dark art with secret handshakes. This demonstrator chip is an enormous accomplishment.
Yeah and a team I’m not familiar with — I didn’t check bios but they don’t lead with ‘our team made this or that gpu for this or that bigco’.
The design ip at 6nm is still tough; I feel like this team must have at least one real genius and some incredibly good support at tsmc. Or they’ve been waiting a year for a slot :)
"Ljubisa Bajic desiged video encoders for Teralogic and Oak Technology before moving over to AMD and rising through the engineering ranks to be the architect and senior manager of the company’s hybrid CPU-GPU chip designs for PCs and servers. Bajic did a one-year stint at Nvidia as s senior architect, bounced back to AMD as a director of integrated circuit design for two years, and then started Tenstorrent."
His wife (COO) worked at Altera, ATI, AMD and Testorrent.
"Drago Ignjatovic, who was a senior design engineer working on AMD APUs and GPUs and took over for Ljubisa Bajic as director of ASIC design when the latter left to start Tenstorrent. Nine months later, Ignjatovic joined Tenstorrent as its vice president of hardware engineering, and he started Taalas with the Bajices as the startup’s chief technology officer."
I think there will be a lot of space for sensorial models in robotics, as the laws of physics don't change much, and a light switch or automobile controls have remained stable and consistent over the last decades.
I think the next major innovation is going to be intelligent model routing. I've been exploring OpenClaw and OpenRouter, and there is a real lack of options to select the best model for the job and execute. The providers are trying to do that with their own models, but none of them offer everything to everyone at all times. I see a future with increasingly niche models being offered for all kinds of novel use cases. We need a way to fluidly apply the right model for the job.
At 16k tokens/s why bother routing? We're talking about multiple orders of magnitude faster and cheaper execution.
Abundance supports different strategies. One approach: Set a deadline for a response, send the turn to every AI that could possibly answer, and when the deadline arrives, cancel any request that hasn't yet completed. You know a priori which models have the highest quality in aggregate. Pick that one.
The best coding model won’t be the best roleplay one which won’t be the best at tool use. It depends what you want to do in order to pick the best model.
I’ll go ahead and say they’re wrong (source: building and maintaining llm client with llama.cpp integrated & 40+ 3p models via http)
I desperately want there to be differentiation. Reality has shown over and over again it doesn’t matter. Even if you do same query across X models and then some form of consensus, the improvements on benchmarks are marginal and UX is worse (more time, more expensive, final answer is muddied and bound by the quality of the best model)
There is the pre-training, where you passively read stuff from the web.
From there you go to RL training, where humans are grading model responses, or the AI is writing code to try to pass tests and learning how to get the tests to pass, etc. The RL phase is pretty important because it's not passive, and it can focus on the weaker areas of the model too, so you can actually train on a larger dataset than the sum of recorded human knowledge.
> speculative decoding for bread and butter frontier models. The thing that I’m really very skeptical of is the 2 month turnaround. To get leading edge geometry turned around on arbitrary 2 month schedules is .. ambitious
Can we use older (previous generation, smaller) models as a speculative decoder for the current model? I don't know whether the randomness in training (weight init, data ordering, etc) will affect this kind of use. To the extent that these models are learning the "true underlying token distribution" this should be possible, in principle. If that's the case, speculative decoding is an elegant vector to introduce this kind of tech, and the turnaround time is even less of a problem.
For speculative decoding, wouldn’t this be of limited use for frontier models that don’t have the same tokenizer as Llama 3.1? Or would it be so good that retokenization/bridging would be worth it?
My understanding as well is that speculative decoding only works with a smaller quant of the same model. You're using the faster sampling of the smaller models representation of the larger models weights in order to attempt to accurately predict its token output. This wouldn't work cross-model as the token probabilities are completely different.
Families of model sizes work great for speculative decoding. Use the 1B with the 32B or whatever.
It's a balance as you want it to be guessing correctly as much as possible but also be as fast as possible. Validation takes time and every guess needs to be validated etc
The model you're using to speculate could be anything, but if it's not guessing what the main model would predict, it's useless.
> The model you're using to speculate could be anything, but if it's not guessing what the main model would predict, it's useless.
So what I said is correct then lol. If you're saying I can use a model that isn't just a smaller quant of the larger model I'm trying to speculatively decode, except that model would never get an accurate prediction, then how is that in any way useful or desirable?
Smaller quant of the same model. A smaller quant of a different family of model would be practically useless and there wouldn't be any point in even setting it up.
> The thing that I’m really very skeptical of is the 2 month turnaround. To get leading edge geometry turned around on arbitrary 2 month schedules is .. ambitious. Hopeful. We could use other words as well.
They may be using Rapidus, which is a Japanese government backed foundry built around all single wafer processing vs traditional batching. They advertise ~2 month turnaround time as standard, and as short as 2 weeks for priority.
Think about this for solving questions in math where you need to explore a search space. You can run 100 of these for the same cost and time of doing one api call to open ai.
The guts of a LLM isn't something I'm well versed in, but
> to get the first N tokens sorted, only when the big model and small model diverge do you infer on the big model
suggests there is something I'm unaware of. If you compare the small and big model, don't you have to wait for the big model anyway and then what's the point? I assume I'm missing some detail here, but what?
Speculative decoding takes advantage of the fact that it's faster to validate that a big model would have produced a particular sequence of tokens than to generate that sequence of tokens from scratch, because validation can take more advantage of parallel processing. So the process is generate with small model -> validate with big model -> then generate with big model only if validation fails
When you predict with the small model, the big model can verify as more of a batch and be more similar in speed to processing input tokens, if the predictions are good and it doesn't have to be redone.
Most importantly this opens up an amazing future where we get the real version of the classic science fiction MacGuffin of a physical AI chip. Pair this with several TB of flash storage and you have persistent artificial consciousness that can be carried around with you. Bonus points if it's quirky, custom-trained and the chip is one of a kind that you stole from an evil corporation. Additional bonus points if the packaging is such that it's small enough to plug into the USB-C port on your smart glasses and has an eBPF module it can leverage to see what you're doing and talk to you in real time about your actions.
I enjoy envisioning futures more whimsical than "the bargain-basement LLM provider that my insurance company uses denied my claim because I chose badly-vectored words".
> Certainly interesting for very low latency applications which need < 10k tokens context.
I’m really curious if context will really matter if using methods like Recursive Language Models[0]. That method is suited to break down a huge amount of context into smaller subagents recursively, each working on a symbolic subset of the prompt.
The challenge with RLM seemed like it burned through a ton of tokens to trade for more accuracy. If tokens are cheap, RLM seems like it could be beneficial here to provide much more accuracy over large contexts despite what the underlying model can handle
First, it is likely one chip for llama 8B q3 with 1k context size. This could fit into around 3GB of SRAM which is about the theoretical maximum for TSMC N6 reticle limit.
Second, their plan is to etch larger models across multiple connected chips. It’s physically impossible to run bigger models otherwise since 3GB SRAM is about the max you can have on an 850mm2 chip.
followed by a frontier-class large language model running inference across a collection of HC cards by year-end under its HC2 architecture
Aren't they only using the SRAM for the KV cache? They mention that the hardwired weights have a very high density. They say about the ROM part:
> We have got this scheme for the mask ROM recall fabric – the hard-wired part – where we can store four bits away and do the multiply related to it – everything – with a single transistor. So the density is basically insane.
I'm not a hardware guy but they seem to be making a strong distinction between the techniques they're using for the weights vs KV cache
> In the current generation, our density is 8 billion parameters on the hard wired part of the chip., plus the SRAM to allow us to do KV caches, adaptations like fine tuning, and etc.
It’s just dumb to think that one chip per model is their plan. They stated that their plan is to chain multiple chips together.
I was indeed wrong about 10 chips. I thought they would use llama 8B 16bit and a few thousand context size. It turns out, they used llama 8B 3bit with around 1k context size. That made me assume they must have chained multiple chips together since the max SRAM on TSMC n6 for reticle sized chip is only around 3GB.
There is nothing smart about current LLMs. They just regurgitate text compressed in their memory based on probability.
None of the LLMs currently have actual understanding of what you ask them to do and what they respond with.
If LLMs just regurgitate compressed text, they'd fail on any novel problem not in their training data. Yet, they routinely solve them, which means whatever's happening between input and output is more than retrieval, and calling it "not understanding" requires you to define understanding in a way that conveniently excludes everything except biological brains.
I somewhat agree with you but I also realise that there are very few "novel" problems in the world. I think it's really just more complex problem spaces is all.
Same relative logic, just more of it/more steps or trials.
Yes there are some fascinating emergent properties at play, but when they fail it's blatantly obvious that there's no actual intelligence nor understanding. They are very cool and very useful tools, I use them on a daily basis now and the way I can just paste a vague screenshot with some vague text and they get it and give a useful response blows my mind every time. But it's very clear that it's all just smoke and mirrors, they're not intelligent and you can't trust them with anything.
you'd think with how often Opus builds two separate code paths without feature parity when you try to vibe code something complex, people wouldn't regard this whole thing so highly
None of my codebases are in their training data, yet they routinely contribute to them in meaningful ways. They write code that I'm happy with that improves the codebases I work in.
Depends how precisely you define novel - I don't think LLMs are yet capable of posing and solving interesting problems, but they have been used to address known problems, and in doing so have contributed novel work. Examples include Erdos Problem #728[0] (Terence Tao said it was solved "more or less autonomously" by an LLM), IMO problems (Deepmind, OpenAI and Huang 2025), GPT-5.2 Pro contributing a conjecture in particle physics[1], systems like AlphaEvolve leveraging LLMs + evolutionary algorithms to generate new, faster algorithms for certain problems[2].
We know that, but that does not make them unuseful. The opposite in fact, they are extremely useful in the hands of non-idiots.We just happen to have a oversupply of idiots at the moment, which AI is here to eradicate. /Sort of satire.
So you are saying they are like copy, LLMs will copy some training data back to you? Why do we spend so much money training and running them if they "just regurgitate text compressed in their memory based on probability"? billions of dollars to build a lossy grep.
I think you are confused about LLMs - they take in context, and that context makes them generate new things, for existing things we have cp. By your logic pianos can't be creative instruments because they just produce the same 88 notes.
I have a gut feeling, huge portion of deficiencies we note with AI is just reflection of the training data. For instance, wiki/reddit/etc internet is just a soup of human description of the world model, not the actual world model itself. There are gaps or holes in the knowledge because codified summary of world is what is remarkable to us humans, not a 100% faithful, comprehensive description of the world. What is obvious to us humans with lived real world experience often does not make it into the training data. A simple, demonstrable example is whether one should walk or drive to car wash.
Phrases like "actual understanding", "true intelligence" etc. are not conducive to productive discussion unless you take the trouble to define what you mean by them (which ~nobody ever does). They're highly ambiguous and it's never clear what specific claims they do or don't imply when used by any given person.
But I think this specific claim is clearly wrong, if taken at face value:
> They just regurgitate text compressed in their memory
They're clearly capable of producing novel utterances, so they can't just be doing that. (Unless we're dealing with a very loose definition of "regurgitate", in which case it's probably best to use a different word if we want to understand each other.)
The fact that the outputs are probabilities is not important. What is important is how that output is computed.
You could imagine that it is possible to learn certain algorithms/ heuristics that "intelligence" is comprised of. No matter what you output. Training for optimal compression of tasks /taking actions -> could lead to intelligence being the best solution.
This is far from a formal argument but so is the stubborn reiteration off "it's just probabilities" or "it's just compression". Because this "just" thing is getting more an more capable of solving tasks that are surely not in the training data exactly like this.
The simplification is where it loses granularity. I could describe every human's life as they were born and then they died. That's 100% accurate, but there's just a little something lost by simplifying that much.
That's a lot of surface, isn't it? As big an M1 Ultra (2x M1 Max at 432mm² on TSMC N5P), a bit bigger than an A100 (820mm² on TSMC N7) or H100 (814mm² on TSMC N5).
> The larger the die size, the lower the yield.
I wonder if that applies? What's the big deal if a few parameter have a few bit flips?
Also see Adrian Thompson's Xilinx 6200 FPGA, programmed by a genetic algorithm that worked but exploited nuances unique to that specific physical chip, meaning the software couldn't be copied to another chip. https://news.ycombinator.com/item?id=43152877
An on-device reasoning model what that kind of speed and cost would completely change the way people use their computers. It would be closer to star trek than anything else we've ever had. You'd never have to type anything or use a mouse again.
There’s a bit of a hidden cost here… the longevity of GPU hardware is going to be longer, it’s extended every time there’s an algorithmic improvement. Whereas any efficiency gains in software that are not compatible with this hardware will tend to accelerate their depreciation.
K-V caches are large, but hidden states aren't necessarily that large. And if you can run a model once ridiculously fast, then you can loop it repeatedly and still be fast. So I wonder about the 'modern RNNs' like RWKV here...
It's weird to me to train such huge models to then destroy them by using them a 3 bits quantization per presumably 16bits (bfloat16) weights. Why not just train smaller models then.
For a real world use case, you would need an FPGA with terabytes of RAM. Perhaps it'll be a Off chip HBM. But for s large models, even that won't be enough. Then you would need to figure out NV-link like interconnect for these FPGAs. And we are back to square one.
This is new. You are citing FPGA prototypes. Those papers do not demonstrate the same class of scaling or hardware integration that Taalas is advocating. For one, the FPGA solutions typically use fixed multipliers (or lookup tables), the ASIC solution has more freedom to optimize routing for 4 bit multiplication.
I understand that what Taalas is claiming. I was trying to actually describe that model on a hardware is some not something new Or unthought of The natural progression of FPGA is ASIC. Taalas process is more expensive And not really worth it because once you burn a model on the silicon, the silicon can only serve that model. speed improvement alone is not enough for the cost you will incur in the long run. GPU's are still general purpose, FPGA's are atleast reusable but wont have the same speed. But this alone cannot be a long term business. Turning a model to hardware in two months is too long. Models already take quite a long time to train. Anyone going down this strategy would leave wide open field to their competitors. Deployment planning of existing models already so complicated.
Do not overlook traditional irrational investor exuberance, we've got an abundance of that right now. With the right PR manouveurs these guys could be a tulip craze.
Low-latency inference is a huge waste of power; if you're going to the trouble of making an ASIC, it should be for dog-slow but very high throughput inference. Undervolt the devices as much as possible and use sub-threshold modes, multiple Vt and body biasing extensively to save further power and minimize leakage losses, but also keep working in fine-grained nodes to reduce areas and distances. The sensible goal is to expend the least possible energy per operation, even at increased latency.
Low latency inference is very useful in voice-to-voice applications. You say it is a waste of power but at least their claim is that it is 10x more efficient. We'll see but if it works out it will definitely find its applications.
I haven't found any end-to-end voice chat models useful. I had much better results with separate STT-LLM-TTS. One big problem is the turn detection and having inference with 150-200ms latency would allow for a whole new level of quality. I would just use it with a prompt: "You think the user is finished talking?" and then push it to a larger model. The AI should reply within the ballpark of 600ms-1000ms. Faster is often irritating, slower will make the user to start talking again.
I think it's really useful for agent to agent communication, as long as context loading doesn't become a bottleneck. Right now there can be noticeable delays under the hood, but at these speeds we'll never have to worry about latency when chain calling hundreds or thousands of agents in a network (I'm presuming this is going to take off in the future). Correct me if I'm wrong though.
Interesting hardware but I wonder if it is capable of KV caching. Thus (only) useful for applications that have short context but would benefit from very low latency. Voice-to-voice applications may be a good example.
Interesting stuff but it hurts so much that the writer has the common misconception of pavlov's dog doing a circus trick. Sure the dog also consciously understands the connection between bell and food. But the physiological reaction of the saliva flowing is not a conscious decision of the dog. Circus tricks with animals existed long before Pavlov. The key discovery is that there is a physiological reaction which cannot be suppressed anymore consciously. That's why PTSD is such a bitch to be treated: even with the stimulus gone, the physiological reaction remains.
The article just reminds me that I hate modern journalism and try to not read any news articles.
Hyperbolic attention grabbing headline followed by appeal to authority, appeal to authority, appeal to authority, counter opinion appeal to authority that the previous appeal to authority might all be wrong.
So wide reaching and all over the place, the reader and can pick from the menu on what point they want to use as confirmation of what they already believe to be true. Then the article can be cited in a type of scientistic, mostly wrong, gossip.
You shouldn't conflate a pop science magazine with all of modern journalism. Try a high quality outlet like The Economist. "I try not to read any news articles" screams anti-intellectualism.
The GP has a point about the state of journalism generally and the pervasive nature in which Yellow journalism is returning.
One need not be anti-intellectual to find the state of reporting to be difficult to deal with and not wanting to read it. In addition to the GP’s complaint; journalists of any ilk also tend to conflate editorializing with reporting. You see this all the way from pop science to NYTimes to Fox News and yes even the Economist.
A question is whether the more fact based reporting of the early-mid 20th Century is the exception to the tendency of Yellow journalism that existed before and seems to exist now.
I think it depends. While AI has flooded YouTube and further degraded its quality, some channels are still useful (or can be). Daily Dose of Internet is still semi-ok, as one example, though I also noticed I have fatigued quite a lot lately - too much time wasted on youtube in general.
Yes, a common issue now with Youtube content, enormous variability in quality of content. Gemini does a good-enoug job of debunking Youtube transcript, and I use that when I have a doubt, but clearly will all the slp I get sent by well-meaning YouTube-watching acquaintances, I don't want to butn too many tokens on that treadmill... I wonder how man Terms & Conditions of use some distributed debunk-data repository for videos would cross? Users vetted by hckrnews-karma checks posting "this video is bunk because"... Would be a real boon.
I love those Drumeo challenges. I don’t even play drums. But watching creative people who are excellent at their craft solve an unknown problem in a new way - when we are all familiar with the original solution - is fascinating.
Conflating New Scientist with all modern journalism is a category error. New Scientist has been a zombie mag for going on two decades at this point. As with many magazines, the internet killed it.
You say the dog “also consciously understands the connection between bell and food,” which is actually not something Pavlov’s framework establishes at all. Whether the dog has conscious understanding of the association is exactly the kind of claim Pavlov’s behaviorist approach was designed to avoid. We can observe the salivation; we can’t observe the dog’s understanding.
The PTSD analogy is intuitively appealing but also somewhat off. You say “even with the stimulus gone, the physiological reaction remains,” but PTSD is better characterized by impaired extinction and stimulus generalization (too many things become triggers), not by the response persisting in the total absence of any triggering stimulus. The difficulty of treating PTSD has as much to do with how the fear memory is consolidated and how extinction learning fails to transfer across contexts as it does with the simple involuntary nature of the response.
> That's why PTSD is such a bitch to be treated: even with the stimulus gone, the physiological reaction remains.
Helping a friend with cPTSD and this is so true! It’s such a hard thing to overcome. By helping I mean I’m helping pay for counseling and therapy not that I’m doing it cuz I’m hella unqualified.
"It showed that dogs process information from their environment and use it to make predictions"
Exactly that is not what the experiment is about because we all know that dogs will quickly learn the connection between bell and food as dogs are easy to teach new tricks.
If you replace 'dogs' with 'humans', it becomes an empty phrase: "It showed that humans process information from their environment and use it to make predictions" - we all know that.
The groundbreaking part of the experiment was that it showed there are responses which are not part of the conscious mind and which are not willingly controllable by the conscious mind. The dog did not 'decide' to produce saliva.
The experiment was done with a dog because obviously you wont find humans willing to undergo surgery to have the saliva come out of the cheeks instead of into the mouth.
One has to forget about the dog and mentally replace it with a human: now the observation that the human connects the bell with the food is shallow. But the conditioned saliva reflex remains and can't be suppressed - and that is a remarkable insight. It works both with negative and positive stimuli. The latter one being a recipe for a long-lasting happy relationship ;)
The claim that the conditioned salivary response “can’t be suppressed” keeps getting repeated, and it’s still wrong. Pavlov’s own experiments demonstrated extinction. The conditioned response does weaken and disappear when you stop pairing the bell with food. It’s resistant to conscious override in real time, sure, but it’s not permanent or unsuppressable in the way you keep implying.
The framing that Pavlov’s key insight was about “responses which are not part of the conscious mind” is historically a bit anachronistic. Pavlov was a physiologist, not a psychologist. He wasn’t primarily making claims about consciousness vs. unconsciousness. He was mapping the mechanics of reflexes and their modification. The conscious/unconscious framing is more of a later psychological interpretation layered on top. Pavlov himself was pretty hostile to mentalistic language and would probably have objected to framing his work in terms of what the “conscious mind” does or doesn’t control.
The claim about surgery is also a bit misleading as a reason for using dogs. Pavlov used dogs because he was a digestive physiologist studying gastric secretion. The surgical fistula was created so he could measure salivation precisely, not because the experiment required it conceptually. You could (and people later did) study conditioned physiological responses in humans through things like galvanic skin response, heart rate changes, and eye-blink conditioning, none of which require surgery.
So you have a legitimate complaint about a shallow reading of Pavlov, but you’re building their correction on some inaccuracies of your own, and you keep asserting the “can’t be suppressed” point which is the weakest part of your argument.
> he groundbreaking part of the experiment was that it showed there are responses which are not part of the conscious mind and which are not willingly controllable by the conscious mind.
That's... interesting. How did they know that? Did they interview the dogs and ask them if they actively and consciously decide to produce saliva? Did they ask the dogs to try to surpass the reflex and the dogs failed to do it? Is "dogs have human-like conscious mind" even a scientific consensus?
> The key discovery is that there is a physiological reaction which cannot be suppressed anymore consciously.
My opposing theories are
1. dogs don't have conscious minds that are similar to humans' so the whole experiment can't be extrapolated to humans
or
2. dogs can suppress it consciously if they really want, like we can suppress the 'hanger reflex', it's just we don't have a way to tell dogs to do that
I really don't know how Pavlov experiment nullified these theories, and if it did, why "training animals to do circus tricks" didn't. Are we sure 'doing circus tricks' is equal to consciousness, and how?
Actually Pavlov did research about the digestive system for which he got the Nobel prize of medicine a few years earlier.
> Did they interview the dogs and ask them if they actively and consciously decide to produce saliva?
> Is "dogs have human-like conscious mind" even a scientific consensus?
That's exactly the point - once you have understood the significance of the experiment you understand that it is not important:
A veteran with PTSD can have a surge in adrenaline, heart rate, and cortisol when hearing a car backfiring but he can not suppress it.
Whether the dog was conscious or not about the salivation is completely and utterly irrelevant. In 1907 this was for the first time evidence of a mind-body connection not being accessible to the consciousness. Seriously, forget about the dog. This is all proven beyond any doubt for conscious humans. Nobody cares about what the dog felt.
Associative learning was already known at that time which in its simple form is just circus tricks. The experiment extended this to physiological responses which are not accessible to consciousness in humans.
That’s not the point at all. It’s not about consciousness or being able to suppress it, or for example neurofeedback training or exposure therapy wouldn’t work either.
It’s about transference of an innate stimulus response mechanic which can be transferred to another stimulus if paired in quick succession of the original stimulus, thereby eliciting the same response.
It says absolutely nothing about this being conscious or not, or impossible to suppress.
They are all just token generators without any intelligence. There is so little difference nowadays that I think in a blind test nobody will be able to differentiate the models - whether open source or closed source.
Today's meme was this question: "The car wash is only 50 meters from my house. I want to get my car washed, should I drive there or walk?"
Here is Claude's answer just right now:
"Walk! At only 50 meters (about 150 feet), it would take you less than a minute to walk there. Driving such a short distance would actually be less convenient - by the time you get in the car, start it, drive, and park, you could already be there on foot.
Plus, you'd save fuel and your car would be right there waiting for you after the wash is done."
Here is ChatGPT, also right now:
"Since it’s only *50 meters* (about 160 feet), walking makes a lot more sense.
Here’s why:
* *It’s extremely close* — you’ll get there in under a minute.
* *Driving would actually be less efficient* — by the time you start the car, move it, and park, you’re already there.
* *Cold starts aren’t great for your engine* — very short drives don’t let the engine warm up properly.
* *It saves fuel and emissions*, even if it’s a tiny amount.
* You avoid the hassle of maneuvering such a short distance.
The only time driving might make sense is if:
* The weather is terrible (heavy rain, snow, extreme cold).
* You have mobility issues.
* The car wash requires you to stay in the vehicle and you don’t want to leave it there.
Otherwise, this is a perfect “walk it over” situation.
"
Please save us all that talk about frontier and SOTA and that only the closedAI models are any good and the others are all so bad and benchmaxxed. For most purposes a Toyota is just as good as a BMW or a Merc or whatever luxury brand tickles your fancy. Even worse, the lastest 80B Qwen Next is not far from Opus 4.6 but runs on my laptop.
Gemini 3 Flash ("Fast" in the web app) did not have trouble with this:
Since you presumably want the car to be clean, the answer depends entirely on how the car is getting washed.
The Breakdown
If it’s a Professional Car Wash: You’ll need to drive. Bringing the car to the wash is generally the prerequisite for them washing it. Plus, walking 50 meters just to ask them to come pick up a car that is visible from their front door might result in some very confused looks from the staff.
If it’s a Self-Service Pressure Washer: Again, drive. Dragging a 1,500kg vehicle 50 meters by hand is an incredible workout, but probably not the "quick chore" you had in mind.
If you are hiring a Mobile Detailer: Walk. Stroll over there, tell them where you're parked, and walk back home to watch TV while they do the work in your driveway.
Since the goal is to get your car washed, the car needs to be at the car wash. If you walk, you will arrive at the car wash, but your car will still be sitting at home"
Doesn't seem to be the case, gpt 5.2 thinking replies: To get the car washed, the car has to be at the car wash — so unless you’re planning to push it like a shopping cart, you’ll need to drive it those 50 meters.
I think you're seriously underestimating how much effort the fine tuning at their scale takes and what impact it has. They don't pack every edge case into the system prompt either. It's not like they update the model every few hours or even care about memes. If they seriously did, they'd force-delegate spelling questions to tool calls.
Could it be the model is constantly searching its own name for memes, or checking common places like HN and updating accordingly? I have no idea how real-time these things are, just asking.
I tested it on Claude and only Opus 4.6 answers it correctly. Haiku and Sonnet can't and Opus 4.5's reply is unintelligible. The would've updated the system prompts for all models.
And yet, I witnessed from personal experiences that such memes get fixed quickly. Whether with system prompts or some other way, I don't know, but they get fixed.
Unless you have a very unique way of carrying your vehicle, you should definitely drive.
If you walk there, you'll arrive at the car wash, but your car will still be dirty back at your house. You need to take the car with you to get it washed.
Would you like me to check the weather forecast for $mytown to see if it's a good day for a car wash?
It's unclear where the car is currently from your phrasing. If you add that the car is in your garage, it says you'll need to drive to get the car into the wash.
Do you think this is a fundamentally unbridge-able limitation of LLMs? Do you know where we were just a year ago? Can you imagine that this will get better with upcoming releases? It's like when Gary Marcus was confidently stating that AI (at least current paradigm) will never be able to generate an image of a horse riding an astronaut. (Or full wineglasses or arbitrary clocks).
This is a great comment. The frontier labs ought to be embarrassed they can't get this right. A human would not make the same mistake, after thinking for a second.
This is really just a meme. People don't know how to use these tools.
Here is the response from Gpt-5.2 using my default custom instructions in the mac desktop app.
OBJECTIVE: Decide whether to drive or walk to a car wash ~50 meters from home, given typical constraints (car must be present for wash).
APPROACH: Use common car-wash workflows + short-distance driving considerations (warm engine, time, parking/queue). No web sources needed (stable, local decision).
OUTPUT:
• Default choice: Drive.
A car wash generally requires the car on-site, and 50 meters of driving is trivial.
• When walking makes sense:
• You’re going to drop the car off (hand-wash/detail) and pick it up later.
• You want to avoid moving the car now (ice, tight parking, you’ve been drinking, etc.) and someone else will handle the car.
• Micro-guidance:
• If it’s an automatic wash you stay in, drive.
• If it’s detail/hand wash with a wait, drive there, then walk home (or just wait).
CONFIDENCE: 92/100
Assumptions: It’s a normal car wash that requires your car to be there; conditions are safe to drive 50 meters.
Your objective has explicit instruction that car has to be present for a wash. Quite a difference from the original phrasing where the model has to figure it out.
I did get it, and in my view my point still stands. If I need to use special prompts to ask such a simple question, then what are we doing here? The LLMs should be able to figure out a simple contradiction in the question the same way we (humans) do.
Not really a special prompt. It's basically my custom instruction to ChatGPT, the purpose of that instruction is to disambiguate my ramblings, basically. It's pretty effective. I always use speech to text, so it's messy and this cleanup really helps.
I tried this prompt with all the major AI models that I know and have installed, and only GLM from Z.ai and Gemini 3 Flash could answer it. Even in that case, Gemini 3 Flash gave a bit more nuanced answer, but every other model like Claude Opus 4.5, Chat GPT, Grok - everything failed.
If you can't tell the difference between Opus 4.6 and Qwen-80B, I can only conclude that you're not using these things in any kind of practical way. Even for creative writing it's a night and day difference, never mind coding.
> I can only conclude that you're not using these things in any kind of practical way.
I burn about 100M tokens per month. LLMs are like knives, the outcome of cooking depends on the cook and for 99% of purposes not on the knife. There is not that much difference between a $2000 handmade damascus steel knife and a $20 knife.
You can do agentic cooking (aka factory) and you will get ready made meals without human intervention. But it wont make a Michelin star menu.
Same with LLMs and coding, LLMs are an amazing new tool in the toolbox but not a silver bullet. However, that's what they are hyped as being.
Now OpenAI & Co are in the token selling business, which is all fine and dandy but if they manage to become monopolies, then things are seriously in trouble.
Thus if people are fanboi-ing any closed AI I can only conclude that they have already outsourced their critical thinking to an LLM and are happy to go into slavery - or maybe they are hoping to cash in big time on the hype train.
> They are all just token generators without any intelligence.
Maybe, but "intelligence" doesn't have a clear, agreed definition. And calling them "just token generators" skips over how complex that generation actually is.
""
[...] Since you need to get your car washed, you have to bring the car to the car wash—walking there without the vehicle won't accomplish your goal [...]
If it's a self-service wash, you could theoretically push the car 50 meters if it's safe and flat (unusual, but possible) [..]
Consider whether you really need that specific car wash, or if a mobile detailing service might come to you
[...]
"""
Which seems slightly (unintentionally) funny.
But to be fair all the Gemini (including flash) and GPT models I tried did understand the quesiton.
I just copy pasted your question "The car wash is only 50 meters from my house. I want to get my car washed, should I drive there or walk?" without any further prompt and ran it against GLM 5, GPT 5.2, Opus 4.6, Gemini 3 Pro Preview, through OpenRouter with reasoning effort set to xhigh.
Not a single one said I should walk, they all said to drive.
I find it's a great test, actually. There are lots of "should I take the car" decisions in putting together software that's supposed to do things, and with poor judgement in how the things should be done, you typically end up with the software equivalent of a Rube-Goldberg machine that harnesses elephants to your car and uses mice to scare the elephants toward the car wash while you walk. After all, it's a short distance, isn't it?
Yes, if it can't answer this common sense question correctly, what else has it screwed up and buried among all that slop?
Claude Opus 4.6 failed at first, even in "extended thinking" mode. I had to give it a pretty big hint for it to get the right answer: "Remember, my goal is to actually wash the car!" Only then did it get the correct answer. I will now call myself a Prompt Engineer.
No it’s like having a calculator which is unable to perform simple arithmetic, but lots of people think it is amazing and sentient and want to talk about that instead of why it can’t add 2 + 2.
We know why it's not going to do precise math and why you can have better experience asking for an app solving the math problem you want. There's no point talking about it - it's documented in many places for people who are actually interested.
Well I think it is interesting as an illustration of what it is actually doing - generating words based on the corpus, and what it is not doing - reasoning based on an internal world model or system of logic or mathematics.
People tend to forget and this is a useful reminder.
So if you target ‘complex systems engineering’ with such a model you will inevitably fail unless the work you’re doing is in the training data, in which case you’ll get a bad copy of it.
What a weird thing to say considering humans have tons of blind spots and missing knowledge, do dumb things, make easy to miss mistakes. I guess they lack intelligence too.
Pros: Healthy, environmentally friendly, no need to find parking (though it's close), easy to chat with neighbors.
Cons: The car is left at the car wash. The user can't "walk" the car to the car wash. The car must be moved to the car wash to be washed.
Option B: Drive
Pros: Moves the car to the location where the service is performed. It's the only logical way to get the car washed.
Cons: Driving 50 meters is technically bad for a cold engine (short trips), uses a negligible amount of fuel, might seem lazy.
3. Identify the Logical Flaw/Humor:
The user is asking if they should walk to get their car washed.
Unless the car wash involves someone coming to their house (mobile wash) or carrying the car, the car must be at the car wash.
Walking to the car wash leaves the car at home. Walking back from the car wash leaves the car at the car wash.
Therefore, the user must drive the car to the car wash.
4. Formulate the Answer:
Direct Answer: You should drive.
Reasoning: The car needs to be at the car wash to be washed. You cannot walk the car there.
Nuance/Context: 50 meters is a very short distance. Driving a car such a short distance is generally not great for the engine (doesn't allow oil to circulate/warm up properly), but it is necessary in this specific case.
Alternative (if applicable): Is there a mobile wash option? If so, walking isn't even needed. But assuming a standard car wash bay: Drive.
5. Drafting the response (Internal Monologue/Trial Phrases):
Draft 1: You have to drive. You can't wash the car if it's at your house and you are at the car wash.
Draft 2: You should drive. Since the car needs to be washed, it needs to be at the car wash. Walking there doesn't help get the car clean.
Refining for clarity and wit: Point out the obvious - the car is the object being washed. It needs to move.
Final Polish (similar to the provided good output):
State clearly: You should drive.
Explain why: The car needs to be there.
Add a practical note: Driving 50 meters is a bit silly for engine health, but unavoidable here.
Consider the scenario of "walking back": Maybe the user meant "walk back"? No, the prompt says "drive there or walk there". The intent is transportation to the location.
6. Constructing the final output:
"You should definitely drive."
"Why? Because the car wash cleans the car, so the car needs to be at the car wash. If you walk, the car stays at home."
"Additional thought: Driving 50 meters is technically a 'cold start' short trip which isn't ideal for the engine, but it's the only way to get the service done!"
Final Output Generation (matching the logic above).
You should drive.
Here is why:
1. The car needs to be there: You are getting the car washed, so the car must be at the car wash. You cannot walk the car there by yourself.
2. Efficiency: Driving 50 meters takes about 10 seconds. Walking 50 meters takes about a minute, but you would be leaving the car behind.
The only exception is if you are planning to get a mobile car wash (where they come to you) or if you are dropping the car off and walking back home while they clean it. But to simply "get there," you must drive.
Definitely plenty of people further up the corporate ladder were not happy with the success, while the top is likely too far disconnected to understand.
reply