More

dust42 · 2026-03-04T15:04:49 1772636689

Sounds very plausible to me too. Because even if you refocus the business unit it makes no sense to lay off a highly capable team. Finding new people, integrating them into the team - all that costs a lot of time and money and there is no guarantee for success.

Definitely plenty of people further up the corporate ladder were not happy with the success, while the top is likely too far disconnected to understand.

dust42 · 2026-02-28T11:18:49 1772277529

Cynical or not, I think it was an absolutely brilliant move: "Mass domestic surveillance of Americans constitutes a violation of fundamental rights". I think they placed their bets on Sama signing a contract with the DoD and here we are, one day later the news that OpenAI signed a contract is out. An absolute PR disaster for OpenAI. And an absolute PR victory for Anthropic.

I think OpenAI's IPO will be interesting. Not even the conservative media will be happy about mass surveillance of Americans.

For non-Americans not much change, they don't really care about your rights more than about a pile of dog poo.

dust42 · 2026-02-27T10:22:00 1772187720

Exactly. At this level you don't just put out a statement of your personal opinion. This is run through PR and coordinated with the investors. Otherwise the CEO finds himself on the street by tomorrow. Whatever their motives are, it is aligned with VC, because if it is not then the next day there is another CEO. As the parent stated, this is not cynicism. I see this just rather factual, it is simply the laws of money.

GorbachevyChase · 2026-02-27T12:14:07 1772194447

I am suspicious the whole thing is a PR stunt to build public trust.

georgefrowny · 2026-02-27T13:06:39 1772197599

In none of their statements do they say they won't do the things:

> we cannot in good conscience accede to their request.

That's very specifically worded to not say "under no circumstances will we do this".

> Two such use cases have never been included in our contracts with the Department of War, and we believe they should not be included now

Is not saying they won't eventually be included.

They've left themselves a backtrack, and with the care there this statement has been crafted, that's surely deliberate.

reactordev · 2026-02-27T13:33:09 1772199189

This. This is a public misdirection. They already signed a new deal. It may be to their disliking but nothing in the statement prevents them from moving forward.

uncletammy · 2026-02-27T15:27:27 1772206047

That is speculation. You might be correct but this statement could simply be a strong signal to the administration to back down. A hail Mary.

abustamam · 2026-02-27T20:07:49 1772222869

Isn't that what we're all doing in this thread? We could certainly take the document at face value but as a parent commenter said, almost every company starts off with "don't be evil" then goes and does evil things.

Is anthropic different? Maybe. But personally I don't see any indication to give them the benefit of the doubt.

5o1ecist · 2026-02-27T18:24:45 1772216685

> ... to back down.

Or else what?

hdb2 · 2026-02-27T14:27:56 1772202476

> They've left themselves a backtrack, and with the care there this statement has been crafted, that's surely deliberate.

What's worse, someone in their PR department will read this thread and be disappointed that the spin didn't work.

brookst · 2026-02-27T14:53:32 1772204012

I mean that’s just adulthood.

There are outcomes where the US government seizes the company. Not super likely, not impossible.

It would be naive to write a statement that a future event will never happen, under any circumstances. People who make that mistake get lambasted for hypocrisy when unforeseen circumstances arise.

I see recognition that making absolute statements about the future is best left to zealots and prophets. Which to me speaks of maturity, not duplicity.

zhengyi13 · 2026-02-27T18:35:09 1772217309

> There are outcomes where the US government seizes the company. Not super likely, not impossible.

Are there historical examples in the US specifically where we've nationalized a business?

Because we've certainly invaded countries and assassinated leaders over exactly the same.

ETA: I could have answered my own question with two minutes of research. Yes, we have: https://thenextsystem.org/history-of-nationalization-in-the-...

5o1ecist · 2026-02-27T18:33:06 1772217186

I'm not sure why you are getting downvoted.

It is indeed a naive, or more likely a dishonest thing to do.

Anyone can promise anything. When there's little to no accountability and public memory/opinion doesn't last a week (or is easily manipulated anyway), then promises mean literally nothing. Very like how, in politics, temporary means permanent.

Or HackerNews itself, with them implementing a little Big Brother. It will, of course, absolutely and without a doubt only "nudge" people and it will absolutely, under no circumtances, pinky promise, never get any worse or do anything else but that.

When there's millions of fools, then those, who actually recognize that they are being fooled, are rarely ever significant in numbers. They're drowned out by the fools, until said fools "wake up" and cry "if only we had known!".

Well ... you could have known, but in your mindlessness you didn't listen and think.

"It must be true, because they say so. D'uh. What are you, dumb?"

darkwater · 2026-02-27T13:44:51 1772199891

This. I don't get why you are getting downvoted. The statement literally says:

  Two such use cases have never been included in our contracts with the Department of War, and we believe they should not be included now:

Last word is very important: "now".

ascorbic · 2026-02-27T16:17:40 1772209060

I'm not saying whether or not they're planning to back down, but this sentence doesn't imply that. The "now" is clearly meant to be in reference to the fact they've not in the past.

ToucanLoucan · 2026-02-27T14:30:26 1772202626

Being a tech forum centered around VC funding means we have a TON of tech bros (derogatory) here, who believe in nothing beyond getting their own piles of money for doing literally anything they can be paid to do. If you offered these guys $20 to murder a grandmother they'd ask if they have to cover the cost of the murder weapon or if that's provided.

I get it to a degree, people gotta eat, and especially right now the market is awful and, not to mention, most hyperscaler businesses have been psychologically obliterating people for a decade or more at this point. Why not graduate to doing it with weapons of war too? But, personally, I sleep better at night knowing nothing I've made is helping guide missiles into school busses but that's just me.

absoluteunit1 · 2026-02-27T12:51:26 1772196686

I share this sentiment.

In general - I don’t know if it’s a coincidence but here on HN for example, I’ve noticed an increasing amount of comments and posts emphasizing the narrative of how “well- intended” Anthropic is.

ternwer · 2026-02-27T21:59:17 1772229557

Feel free to judge them by their actions rather than intentions. This situation being an example.

Beestie · 2026-02-27T13:02:11 1772197331

I'd love to see the financial model that offsets losing your single biggest customer and substantial chunk of your annual revenue with some vague notion of public trust.

mingus88 · 2026-02-27T13:21:42 1772198502

This is so short sighted. We are so early into this AI revolution, and this administration is obviously in a tailspin, with the only folk left in charge being the least capable ones we have seen in a decade

Imagine what the conversation would be like if Mattis, a highly decorated and respected leader were still the SecDef. Instead we are seeing bully tactics from a failed cable news pundit who has neither earned nor deserved any respect from the military he represents.

We are two elections and a major health issue away from a complete change of course.

But short sightedness is the name of the quarterly reporting game, so who knows.

travisgriggs · 2026-02-27T16:12:22 1772208742

> We are so early into this AI revolution…

I keep hoping it’s almost over.

Not trying to be the Luddite. Had multiple questions to AI tools yesterday, and let Claude/Zed do some boilerplate code/pattern rewriting.

I’ve worked in software for 35 years. I’ve seen many new “disruptive” movements come and go (open source, objects, functional, services, containers, aspects, blockchains, etc). I chose to participate in some and not in others. And whether I made the wrong choices or not, I always felt like I could get a clear enough picture of where the bandwagon was going that I could jump in, or hold back, or kind of. My choices weren’t always the same as others, so it’s not like it was obvious to everyone. But the signal felt more deterministic.

With LLM/agents, I find I feel the most unease and uncertainty with how much to lean in, and in what ways to lean in, than I ever have before. A sort of enthusiasm paralysis that is new.

Perhaps it’s just my age.

radley · 2026-02-27T22:21:11 1772230871

Didn't we go through this same kind of uncertainty with PCs, the internet, and smartphones? It's early and we're all noodling around.

HumblyTossed · 2026-02-27T14:12:02 1772201522

I'm seriously worried there won't be more elections. Not hyperbole at all.

palmotea · 2026-02-27T15:28:51 1772206131

> I'm seriously worried there won't be more elections. Not hyperbole at all.

Why? That's a an unrealistic fear, driven by the insanely overwrought political rhetoric of 2026. Think about it: elections will be the absolute last thing to go.

If you want something to worry about, worry about this:

> And the stakes of politics are almost always incredibly high. I think they happen to be higher now. And I do think a lot of what is happening in terms of the structure of the system itself is dangerous. I think that the hour is late in many ways. My view is that a lot of people who embrace alarm don’t embrace what I think obviously follows from that alarm, which is the willingness to make strategic and political decisions you find personally discomfiting, even though they are obviously more likely to help you win.

> Taking political positions that’ll make it more likely to win Senate seats in Kansas and Ohio and Missouri. Trying to open your coalition to people you didn’t want it open to before. Running pro-life Democrats.

> And one of my biggest frustrations with many people whose politics I otherwise share is the unwillingness to match the seriousness of your politics to the seriousness of your alarm. I see a Democratic Party that often just wants to do nothing differently, even though it is failing — failing in the most obvious and consequential ways it can possibly fail. (https://www.nytimes.com/2025/09/18/opinion/interesting-times...)

filoeleven · 2026-02-27T18:17:11 1772216231

It's not an unrealistic fear. Trump has been making noises about "taking over elections." Abolishing elections wholesale is very unlikely, sure, but a sham election rigged by a corrupt government? That's standard fare for authoritarians. And there's evidence of voting anomalies in swing states in the 2024 election.

https://www.theguardian.com/us-news/2026/feb/27/trump-voting...

https://electiontruthalliance.org/

abustamam · 2026-02-27T20:10:57 1772223057

Yeah, Russia still has "elections" for all the good that does them.

HoratioHellpop · 2026-02-27T20:30:45 1772224245

Trump _says_ lots. Most of it doesn't come true.

palmotea · 2026-02-27T22:16:20 1772230580

FYI, even though you have a new account, you were banned from your first comment and all your comments automatically show up as hidden-by-default to most users.

bostik · 2026-02-27T19:46:07 1772221567

It's not who votes that counts, but who counts the votes.

(Attributed to Stalin, but likely comes from a despot earlier in the history.)

panarky · 2026-02-27T15:38:14 1772206694

Authoritarian nations continue to have elections, turnout is near 100%, and Dear Leader wins with 90% of the vote.

delecti · 2026-02-27T14:36:44 1772203004

I don't think it's crazy to worry that, but elections are run by the states, there are over 100,000 poling places nationally, and people are pissed. On Jan 3, the entire current House of Representatives terms end; Democratic governors will still hold elections, and if there haven't been elections in GOP-led states, they're out of representation. There are so many hurdles in the way of the fascists canceling or heavily interfering in elections, and they're all just so stupid.

ckemere · 2026-02-27T15:13:55 1772205235

WaPo headline “Administration plans to declare emergency to federalize election rules.” https://www.washingtonpost.com/politics/2026/02/26/trump-ele...

delecti · 2026-02-27T15:44:47 1772207087

Yeah, they can plan whatever they want. No such authority exists, and it must really be emphasized that they're all so stupid.

abustamam · 2026-02-27T20:12:55 1772223175

Stupid and effective are not mutually exclusive.

I do agree with you that no such authority exists, but this administration seems to get away with a lot of things they have no authority to do.

Loudergood · 2026-02-27T16:18:37 1772209117

If you think they're pissed now, just wait to see how they react to election interference.

I recently read up on how the House of Representatives renews itself and quite frankly it's one of the most beautiful processes I've seen, completely removing the influence of the prior congress.

conception · 2026-02-27T14:22:55 1772202175

Putin crushes every election he has. Of course there would be more elections.

re666 · 2026-02-27T14:30:10 1772202610

Mattis- the same highly decorated and respected leader that was on the board of directors at Theranos... edit: added Mattis

johnthescott · 2026-03-01T22:15:16 1772403316

a bit of casual research will show you hegseth is much more than just a fox pundit.

GorbachevyChase · 2026-02-27T15:01:00 1772204460

Their whole strategy is that the lack of a legal moat protecting their product is an existential threat to human life. They are the only moral AI and their competitors must be sanctioned and outlawed. At which point they can transition from AI as commodity to “value” based pricing.

It’s not going to work, but I can’t blame Amodei and friends for trying to make themselves trillionaires.

wartywhoa23 · 2026-02-27T13:10:43 1772197843

I'd love to see any evidence that this single biggest customer is provably and irreversibly lost on all levels of scrutiny as a result of this attempt at building public trust.

Matticus_Rex · 2026-02-27T16:49:11 1772210951

$200M is >2% ARR at the last numbers we got from them, and would take them back... checks notes... literally only a few days of ARR growth.

tdeck · 2026-02-27T19:21:14 1772220074

This is why we should be skeptical of companies that want to tie themselves to the military industrial complex in the first place.

jrs235 · 2026-02-27T14:04:28 1772201068

The rest of the world moves to using you?

HardCodedBias · 2026-02-27T15:14:43 1772205283

It absolutely is a PR stunt. And the media is cheering.

It's absurd.

It's simple: If you do not like working with the military, cancel your contract with the military and pay the penalties.

They are explicitly not doing that.

ternwer · 2026-02-27T21:53:45 1772229225

This effectively is cancelling, isn't it?

You're implying cancelling quietly would be better. But the department would just use a different supplier. This seems like the action someone would take if they cared about the issue.

FabHK · 2026-02-27T16:29:12 1772209752

> If you do not like working with the military, ...

Eh? But they do like to work with the military. How else are you going to "defend the United States and other democracies, and to defeat our autocratic adversaries"?

They want to work with the military, with just two additional guardrails.

heresie-dabord · 2026-02-27T11:18:05 1772191085

> it is simply the laws of money

The First Law of Money: Money buys the Law.

ohbleek · 2026-02-27T13:07:49 1772197669

To quote Brennan Lee Mulligan, "Laws are threats made by the dominant socioeconomic ethnic group in a given nation."

LordDragonfang · 2026-02-27T17:52:00 1772214720

The full[1] quote is:

> “Laws are a threat made by the dominant socioeconomic ethnic group in a given nation. It’s just the promise of violence that’s enacted, and the police are basically an occupying army, you know what I mean?”

...Which is funny, but technically speaking, it's (more or less) a paraphrasing/extrapolation of the very serious political science definition of a state, “a monopoly over the legitimate use of violence in a defined territory”

[1] Minus the last line, which I will allow others to discover for themselves

philipallstar · 2026-02-27T16:50:49 1772211049

Certainly pre-democracy, other than the ethnic group bit.

avmich · 2026-02-27T11:53:53 1772193233

That's maybe the second law. The first one is: money is always finite.

Look at how Elon Musk behaved. Do you think VC gladly approved what he did with Twitter? They might want to keep chasing quarterly results - but sometimes, like with Zukerberg, they can't. Not enough money. Similar examples with Google rounds or how much more financially backed politician loses rather often to a competitor. Or, if you will, Vladimir Putin's idea that he can buy whatever results he wants - and that guy is a very wealthy person. There are always limits, putting the money law to the second place. We might argue that often the existing money is enough... but in more geopolitical, continuum-curving cases there are other powerful forces.

antonvs · 2026-02-27T14:38:01 1772203081

The Twitter acquisition wasn't funded by venture capital, so your question about VC approval doesn't apply.

If you're using VC as a general term for "investor" (inaccurately), then the answer to your question is that the major investors, such as Larry Ellison and the Saudi monarchy, wanted political control of Twitter, which meant that they did (apparently) approve what Musk did with it.

avmich · 2026-02-27T23:02:45 1772233365

You're missing the point. It matters little where exactly money to pay for acquisition of Twitter came from. What matters is that nobody expected Twitter to lose employees and users in such numbers. So, whoever gave the money, was still limited in ensuring the results are "fully enough" in line with their wishes. Because money is always finite.

qdotme · 2026-02-27T10:45:56 1772189156

FWIW, I don’t actually know if board of Anthropic has actual power to replace its CEO or if Dario has retained some form of personal super-control shares Zuckerberg style.

At some level of growth, the dynamics between competent founders and shareholders flip. Even if the board could afford to replace a CEO, it might not be worth it.

dust42 · 2026-02-27T11:19:04 1772191144

I'd counter that at this level of capital, if the CEO doesn't well align with the capital, then super-control shares will be overpowered by super-lawyers and if there is need some super-donations. OpenAI was a public interest company...

qdotme · 2026-02-27T20:31:38 1772224298

Not at all. Especially at that level of capital. It’s the equity equivalent of „if you owe a bank a million dollars, you’re in trouble. If you owe a bank a billion dollars, the bank is in trouble”.

Capital is extremely fungible. Typically extremely overleveraged. Lawyers are on the other hand extremely overprotective. They won’t generally risk the destruction of capital, even in slam-dunk cases. Vide WeWork.

blackqueeriroh · 2026-02-27T13:50:48 1772200248

This is fundamentally incorrect.

nradov · 2026-02-27T15:25:06 1772205906

Anthropic has an odd voting structure. While the CEO Dario Amodei holds no super-voting shares, there are special shares controlled by a separate council of trustees who aren't answerable to investors and who have the power to replace the Board. So in practice it comes down to personal relationships.

Lutger · 2026-02-27T15:08:15 1772204895

Surely you mean the laws of shareholder capitalism. There are many things you can do with money, and only some of them are legally backed by rules that ensure absolute shareholder power.

dust42 · 2026-02-26T13:28:09 1772112489

Whenever you see a Youtube video from a restaurant kitchen you can almost be sure to see some pans where the teflon has been scrubbed down to the pure metal. Probably not that healthy...

mrob · 2026-02-26T13:42:25 1772113345

PTFE is extremely chemically inert. There's possibly some risk from ultra-fine particulates that could be absorbed by the body, but compared to all the other sources of particulate exposure I don't think it's a major problem. I'm more concerned by thermal decomposition, which forms all kinds of mystery chemicals of unknown risk profile. Restaurants love to sear food and I wouldn't be surprised if some of them do it on PTFE cookwear.

stuff4ben · 2026-02-26T13:44:45 1772113485

its highly unlikely any restaurant uses teflon coated pans. Most use carbon steel or stainless. Teflon just won't hold up in daily use at a restaurant.

red-iron-pine · 2026-02-26T14:46:43 1772117203

cost is also gonna be a factor. you can get bulk stainless pans in bulk at reasonable prices and they'll last forever

restaurants also love high-heat and that's a no-no for teflon

dust42 · 2026-02-20T21:07:17 1771621637

I am actually doing now a good part of dev with Qwen3-Coder-Next on an M1 64GB with Qwen Code CLI (a fork of Gemini CLI). I very much like

  a) to have an idea how much tokens I use and 
  b) be independent of VC financed token machines and 
  c) I can use it on a plane/train

Also I never have to wait in a queue, nor will I be told to wait for a few hours. And I get many answers in a second.

I don't do full vibe coding with a dozen agents though. I read all the code it produces and guide it where necessary.

Last not least, at some point the VC funded party will be over and when this happens one better knows how to be highly efficient in AI token use.

ttoinou · 2026-02-21T01:35:41 1771637741

How much tokens per seconds are you getting ?

Whats the advantage of qwen code cli over opencode ?

dust42 · 2026-02-21T03:46:14 1771645574

320 tok/s PP and 42 tok/s TG with 4bit quant and MLX. Llama.cpp was half for this model but afaik has improved a few days ago, I haven't yet tested though.

I have tried many tools locally and was never really happy with any. I tried finally Qwen Code CLI assuming that it would run well with a Qwen model and it does. YMMV, I mostly do javascript and Python. Most important setting was to set the max context size, it then auto compacts before reaching it. I run with 65536 but may raise this a bit.

Last not least OpenCode is VC funded, at some point they will have to make money while Gemini CLI / Qwen CLI are not the primary products of the companies but definitely dog-fooded.

ttoinou · 2026-02-23T15:22:01 1771860121

Works for me, but sometimes there's an issue with the tool template from Qwen, past chats are changed, thus KV cache gets invalidated and it needs to reprocess input tokens from scratch. Doesn't happen all the time though

Btw I also get 42-60 tps on M4 Max with the MLX 4 bit quants hosted by LM Studio, which software do you use to run it ?

dust42 · 2026-02-24T10:18:14 1771928294

I use MLX server directly from the MLX community project (by Apple). 42 tps is with 0-5000 token context. Starts to drop from there, I have never seen 60.

Yesterday I tested the latest llama.cpp and the result is that PP has made a huge jump to 420 tps which is 30% faster than MLX on my M1. TG is now 25 tps which is below MLX but does not degrade much, at 50k context it is still 22-23 tps.

Together with Qwen code CLI llama.cpp does a lot less often re-process the full KV cache. So for now I am switching back to llama.cpp.

It is worth to spend some time with the settings. I am really annoyed by the silly jokes (was it Claude that started this?). You can disable them with customWittyPhrases. Also setting contextWindowSize will make the CLI auto compress, which works really well for me.

And depending on what you do, maybe set privacy.usageStatisticsEnabled to false.

Like Gemini, Qwen CLI supports OpenTelemetry. When I have time I'll have a look why the KV cache gets invalidated.

ttoinou · 2026-02-24T12:38:09 1771936689

Great thanks ! I am so annoyed by a specific phrase which is "launching wit.exe", not funny when it could actually be talking for real about software running on your machine

dust42 · 2026-02-20T11:27:24 1771586844

This is not a general purpose chip but specialized for high speed, low latency inference with small context. But it is potentially a lot cheaper than Nvidia for those purposes.

Tech summary:

  - 15k tok/sec on 8B dense 3bit quant (llama 3.1) 
  - limited KV cache
  - 880mm^2 die, TSMC 6nm, 53B transistors
  - presumably 200W per chip
  - 20x cheaper to produce
  - 10x less energy per token for inference
  - max context size: flexible
  - mid-sized thinking model upcoming this spring on same hardware
  - next hardware supposed to be FP4 
  - a frontier LLM planned within twelve months

This is all from their website, I am not affiliated. The founders have 25 years of career across AMD, Nvidia and others, $200M VC so far.

Certainly interesting for very low latency applications which need < 10k tokens context. If they deliver in spring, they will likely be flooded with VC money.

Not exactly a competitor for Nvidia but probably for 5-10% of the market.

Back of napkin, the cost for 1mm^2 of 6nm wafer is ~$0.20. So 1B parameters need about $20 of die. The larger the die size, the lower the yield. Supposedly the inference speed remains almost the same with larger models.

Interview with the founders: https://www.nextplatform.com/2026/02/19/taalas-etches-ai-mod...

vessenes · 2026-02-20T12:06:45 1771589205

This math is useful. Lots of folks scoffing in the comments below. I have a couple reactions, after chatting with it:

1) 16k tokens / second is really stunningly fast. There’s an old saying about any factor of 10 being a new science / new product category, etc. This is a new product category in my mind, or it could be. It would be incredibly useful for voice agent applications, realtime loops, realtime video generation, .. etc.

2) https://nvidia.github.io/TensorRT-LLM/blogs/H200launch.html Has H200 doing 12k tokens/second on llama 2 12b fb8. Knowing these architectures that’s likely a 100+ ish batched run, meaning time to first token is almost certainly slower than taalas. Probably much slower, since Taalas is like milliseconds.

3) Jensen has these pareto curve graphs — for a certain amount of energy and a certain chip architecture, choose your point on the curve to trade off throughput vs latency. My quick math is that these probably do not shift the curve. The 6nm process vs 4nm process is likely 30-40% bigger, draws that much more power, etc; if we look at the numbers they give and extrapolate to an fp8 model (slower), smaller geometry (30% faster and lower power) and compare 16k tokens/second for taalas to 12k tokens/s for an h200, these chips are in the same ballpark curve.

However, I don’t think the H200 can reach into this part of the curve, and that does make these somewhat interesting. In fact even if you had a full datacenter of H200s already running your model, you’d probably buy a bunch of these to do speculative decoding - it’s an amazing use case for them; speculative decoding relies on smaller distillations or quants to get the first N tokens sorted, only when the big model and small model diverge do you infer on the big model.

Upshot - I think these will sell, even on 6nm process, and the first thing I’d sell them to do is speculative decoding for bread and butter frontier models. The thing that I’m really very skeptical of is the 2 month turnaround. To get leading edge geometry turned around on arbitrary 2 month schedules is .. ambitious. Hopeful. We could use other words as well.

I hope these guys make it! I bet the v3 of these chips will be serving some bread and butter API requests, which will be awesome.

rbanffy · 2026-02-20T14:04:19 1771596259

> any factor of 10 being a new science / new product category,

I often remind people two orders of quantitative change is a qualitative change.

> The thing that I’m really very skeptical of is the 2 month turnaround. To get leading edge geometry turned around on arbitrary 2 month schedules is .. ambitious. Hopeful. We could use other words as well.

The real product they have is automation. They figured out a way to compile a large model into a circuit. That's, in itself, pretty impressive. If they can do this, they can also compile models to an HDL and deploy them to large FPGA simulators for quick validation. If we see models maturing at a "good enough" state, even a longer turnaround between model release and silicon makes sense.

While I also see lots of these systems running standalone, I think they'll really shine combined with more flexible inference engines, running the unchanging parts of the model while the coupled inference engine deals with whatever is too new to have been baked into silicon.

I'm concerned with the environmental impact. Chip manufacture is not very clean and these chips will need to be swapped out and replaced at a cadence higher than we currently do with GPUs.

ttul · 2026-02-20T15:34:48 1771601688

Having dabbled in VLSI in the early-2010s, half the battle is getting a manufacturing slot with TSMC. It’s a dark art with secret handshakes. This demonstrator chip is an enormous accomplishment.

vessenes · 2026-02-20T20:51:38 1771620698

Yeah and a team I’m not familiar with — I didn’t check bios but they don’t lead with ‘our team made this or that gpu for this or that bigco’.

The design ip at 6nm is still tough; I feel like this team must have at least one real genius and some incredibly good support at tsmc. Or they’ve been waiting a year for a slot :)

dust42 · 2026-02-20T21:23:27 1771622607

From the article:

"Ljubisa Bajic desiged video encoders for Teralogic and Oak Technology before moving over to AMD and rising through the engineering ranks to be the architect and senior manager of the company’s hybrid CPU-GPU chip designs for PCs and servers. Bajic did a one-year stint at Nvidia as s senior architect, bounced back to AMD as a director of integrated circuit design for two years, and then started Tenstorrent."

His wife (COO) worked at Altera, ATI, AMD and Testorrent.

"Drago Ignjatovic, who was a senior design engineer working on AMD APUs and GPUs and took over for Ljubisa Bajic as director of ASIC design when the latter left to start Tenstorrent. Nine months later, Ignjatovic joined Tenstorrent as its vice president of hardware engineering, and he started Taalas with the Bajices as the startup’s chief technology officer."

Not a youngster gang...

VagabundoP · 2026-02-20T14:48:16 1771598896

There might be a foodchain of lower order uses when they become "obsolete".

rbanffy · 2026-02-20T16:43:49 1771605829

I think there will be a lot of space for sensorial models in robotics, as the laws of physics don't change much, and a light switch or automobile controls have remained stable and consistent over the last decades.

Gareth321 · 2026-02-20T12:54:47 1771592087

I think the next major innovation is going to be intelligent model routing. I've been exploring OpenClaw and OpenRouter, and there is a real lack of options to select the best model for the job and execute. The providers are trying to do that with their own models, but none of them offer everything to everyone at all times. I see a future with increasingly niche models being offered for all kinds of novel use cases. We need a way to fluidly apply the right model for the job.

nylonstrung · 2026-02-20T13:12:07 1771593127

Agree that routing is becoming the critical layer here. Vllm iris is really promising for this https://blog.vllm.ai/2026/01/05/vllm-sr-iris.html

There's already some good work on router benchmarking which is pretty interesting

condiment · 2026-02-20T15:31:14 1771601474

At 16k tokens/s why bother routing? We're talking about multiple orders of magnitude faster and cheaper execution.

Abundance supports different strategies. One approach: Set a deadline for a response, send the turn to every AI that could possibly answer, and when the deadline arrives, cancel any request that hasn't yet completed. You know a priori which models have the highest quality in aggregate. Pick that one.

IanCal · 2026-02-20T17:13:02 1771607582

The best coding model won’t be the best roleplay one which won’t be the best at tool use. It depends what you want to do in order to pick the best model.

PhunkyPhil · 2026-02-20T18:16:34 1771611394

I'm not saying you're wrong, but why is this the case?

I'm out of the loop on training LLMs, but to me it's just pure data input. Are they choosing to include more code rather than, say fiction books?

refulgentis · 2026-02-20T18:20:33 1771611633

I’ll go ahead and say they’re wrong (source: building and maintaining llm client with llama.cpp integrated & 40+ 3p models via http)

I desperately want there to be differentiation. Reality has shown over and over again it doesn’t matter. Even if you do same query across X models and then some form of consensus, the improvements on benchmarks are marginal and UX is worse (more time, more expensive, final answer is muddied and bound by the quality of the best model)

stephenbez · 2026-02-22T07:54:08 1771746848

Thanks. Are there any links where I can learn more about this?

I did some Googling and it appears that there are some examples where people say combining multiple models or multiple runs of the same models leads to improvements: https://www.sciencedirect.com/science/article/abs/pii/S00104... https://arxiv.org/abs/2203.11171

But presumably people are less likely to publish a paper when an approach doesn’t work.

IanCal · 2026-02-22T01:03:58 1771722238

Are you saying I’m wrong that some models are better for some tasks than others, but there isn’t a universally best model for all tasks?

jmalicki · 2026-02-20T20:06:54 1771618014

There is the pre-training, where you passively read stuff from the web.

From there you go to RL training, where humans are grading model responses, or the AI is writing code to try to pass tests and learning how to get the tests to pass, etc. The RL phase is pretty important because it's not passive, and it can focus on the weaker areas of the model too, so you can actually train on a larger dataset than the sum of recorded human knowledge.

monooso · 2026-02-20T14:08:20 1771596500

I came across this yesterday. Haven't tried it, but it looks interesting:

https://agent-relay.com/

ssivark · 2026-02-21T05:08:22 1771650502

> speculative decoding for bread and butter frontier models. The thing that I’m really very skeptical of is the 2 month turnaround. To get leading edge geometry turned around on arbitrary 2 month schedules is .. ambitious

Can we use older (previous generation, smaller) models as a speculative decoder for the current model? I don't know whether the randomness in training (weight init, data ordering, etc) will affect this kind of use. To the extent that these models are learning the "true underlying token distribution" this should be possible, in principle. If that's the case, speculative decoding is an elegant vector to introduce this kind of tech, and the turnaround time is even less of a problem.

btown · 2026-02-20T12:39:21 1771591161

For speculative decoding, wouldn’t this be of limited use for frontier models that don’t have the same tokenizer as Llama 3.1? Or would it be so good that retokenization/bridging would be worth it?

Zetaphor · 2026-02-20T13:21:02 1771593662

My understanding as well is that speculative decoding only works with a smaller quant of the same model. You're using the faster sampling of the smaller models representation of the larger models weights in order to attempt to accurately predict its token output. This wouldn't work cross-model as the token probabilities are completely different.

jasonjmcghee · 2026-02-20T14:56:11 1771599371

This is not correct.

Families of model sizes work great for speculative decoding. Use the 1B with the 32B or whatever.

It's a balance as you want it to be guessing correctly as much as possible but also be as fast as possible. Validation takes time and every guess needs to be validated etc

The model you're using to speculate could be anything, but if it's not guessing what the main model would predict, it's useless.

Zetaphor · 2026-02-21T17:46:41 1771696001

> The model you're using to speculate could be anything, but if it's not guessing what the main model would predict, it's useless.

So what I said is correct then lol. If you're saying I can use a model that isn't just a smaller quant of the larger model I'm trying to speculatively decode, except that model would never get an accurate prediction, then how is that in any way useful or desirable?

ashirviskas · 2026-02-20T13:45:49 1771595149

Smaller quant or smaller model?

Afaik it can work with anything, but sharing vocab solves a lot of headaches and the better token probs match, the more efficient it gets.

Which is why it is usually done with same family models and most often NOT just different quantizations of the same model.

Zetaphor · 2026-02-21T17:47:28 1771696048

Smaller quant of the same model. A smaller quant of a different family of model would be practically useless and there wouldn't be any point in even setting it up.

vessenes · 2026-02-20T13:35:46 1771594546

I think they’d commission a quant directly. Benefits go down a lot when you leave model families.

jasonwatkinspdx · 2026-02-21T19:04:19 1771700659

> The thing that I’m really very skeptical of is the 2 month turnaround. To get leading edge geometry turned around on arbitrary 2 month schedules is .. ambitious. Hopeful. We could use other words as well.

They may be using Rapidus, which is a Japanese government backed foundry built around all single wafer processing vs traditional batching. They advertise ~2 month turnaround time as standard, and as short as 2 weeks for priority.

empath75 · 2026-02-20T13:57:34 1771595854

Think about this for solving questions in math where you need to explore a search space. You can run 100 of these for the same cost and time of doing one api call to open ai.

joha4270 · 2026-02-20T12:19:59 1771589999

The guts of a LLM isn't something I'm well versed in, but

> to get the first N tokens sorted, only when the big model and small model diverge do you infer on the big model

suggests there is something I'm unaware of. If you compare the small and big model, don't you have to wait for the big model anyway and then what's the point? I assume I'm missing some detail here, but what?

connorbrinton · 2026-02-20T12:48:17 1771591697

Speculative decoding takes advantage of the fact that it's faster to validate that a big model would have produced a particular sequence of tokens than to generate that sequence of tokens from scratch, because validation can take more advantage of parallel processing. So the process is generate with small model -> validate with big model -> then generate with big model only if validation fails

More info:

* https://research.google/blog/looking-back-at-speculative-dec...

* https://pytorch.org/blog/hitchhikers-guide-speculative-decod...

sails · 2026-02-20T13:07:19 1771592839

See also speculative cascades which is a nice read and furthered my understanding of how it all works

https://research.google/blog/speculative-cascades-a-hybrid-a...

speedping · 2026-02-20T12:46:17 1771591577

Verification is faster than generation, one forward pass for verification of multiple tokens vs a pass for every new token in generation

vanviegen · 2026-02-20T12:43:52 1771591432

I don't understand how it would work either, but it may be something similar to this: https://developers.openai.com/api/docs/guides/predicted-outp...

cma · 2026-02-20T12:42:34 1771591354

When you predict with the small model, the big model can verify as more of a batch and be more similar in speed to processing input tokens, if the predictions are good and it doesn't have to be redone.

ml_basics · 2026-02-20T12:56:28 1771592188

They are referring to a thing called "speculative decoding" I think.

soleveloper · 2026-02-20T12:38:34 1771591114

In 20$ a die, they could sell Gameboy style cartridges for different models.

twalla · 2026-02-21T02:27:37 1771640857

Okay, now _this_ is the cyberpunk future I asked for.

noveltyaccount · 2026-02-20T15:13:44 1771600424

That would be very cool, get an upgraded model every couple of months. Maybe PCIe form factor.

soleveloper · 2026-02-20T20:28:40 1771619320

Yes, and even holding couple of cartridges for different scenarios e.g image generation, coding, tts/stt, etc

pennomi · 2026-02-20T15:58:58 1771603138

Make them shaped like floppy disks to confuse the younger generations.

fennecbutt · 2026-02-22T01:58:39 1771725519

Microsoft

merlindru · 2026-02-21T00:34:09 1771634049

dude that would be so incredibly cool

alexjplant · 2026-02-21T15:21:18 1771687278

Most importantly this opens up an amazing future where we get the real version of the classic science fiction MacGuffin of a physical AI chip. Pair this with several TB of flash storage and you have persistent artificial consciousness that can be carried around with you. Bonus points if it's quirky, custom-trained and the chip is one of a kind that you stole from an evil corporation. Additional bonus points if the packaging is such that it's small enough to plug into the USB-C port on your smart glasses and has an eBPF module it can leverage to see what you're doing and talk to you in real time about your actions.

I enjoy envisioning futures more whimsical than "the bargain-basement LLM provider that my insurance company uses denied my claim because I chose badly-vectored words".

jameslk · 2026-02-20T22:33:09 1771626789

> Certainly interesting for very low latency applications which need < 10k tokens context.

I’m really curious if context will really matter if using methods like Recursive Language Models[0]. That method is suited to break down a huge amount of context into smaller subagents recursively, each working on a symbolic subset of the prompt.

The challenge with RLM seemed like it burned through a ton of tokens to trade for more accuracy. If tokens are cheap, RLM seems like it could be beneficial here to provide much more accuracy over large contexts despite what the underlying model can handle

0. https://arxiv.org/abs/2512.24601

aurareturn · 2026-02-20T11:30:49 1771587049

Don’t forget that the 8B model requires 10 of said chips to run.

And it’s a 3bit quant. So 3GB ram requirement.

If they run 8B using native 16bit quant, it will use 60 H100 sized chips.

dust42 · 2026-02-20T11:38:59 1771587539

> Don’t forget that the 8B model requires 10 of said chips to run.

Are you sure about that? If true it would definitely make it look a lot less interesting.

aurareturn · 2026-02-20T11:46:23 1771587983

Their 2.4 kW is for 10 chips it seems based on the next platform article.

I assume they need all 10 chips for their 8B q3 model. Otherwise, they would have said so or they would have put a more impressive model as the demo.

https://www.nextplatform.com/2026/02/19/taalas-etches-ai-mod...

audunw · 2026-02-20T12:09:37 1771589377

It doesn’t make any sense to think you need the whole server to run one model. It’s much more likely that each server runs 10 instances of the model

1. It doesn’t make sense in terms of architecture. It’s one chip. You can’t split one model over 10 identical hardwire chips

2. It doesn’t add up with their claims of better power efficiency. 2.4kW for one model would be really bad.

aurareturn · 2026-02-20T12:45:49 1771591549

We are both wrong.

First, it is likely one chip for llama 8B q3 with 1k context size. This could fit into around 3GB of SRAM which is about the theoretical maximum for TSMC N6 reticle limit.

Second, their plan is to etch larger models across multiple connected chips. It’s physically impossible to run bigger models otherwise since 3GB SRAM is about the max you can have on an 850mm2 chip.

  followed by a frontier-class large language model running inference across a collection of HC cards by year-end under its HC2 architecture

https://mlq.ai/news/taalas-secures-169m-funding-to-develop-a...

pigpop · 2026-02-20T22:15:19 1771625719

Aren't they only using the SRAM for the KV cache? They mention that the hardwired weights have a very high density. They say about the ROM part:

> We have got this scheme for the mask ROM recall fabric – the hard-wired part – where we can store four bits away and do the multiply related to it – everything – with a single transistor. So the density is basically insane.

I'm not a hardware guy but they seem to be making a strong distinction between the techniques they're using for the weights vs KV cache

> In the current generation, our density is 8 billion parameters on the hard wired part of the chip., plus the SRAM to allow us to do KV caches, adaptations like fine tuning, and etc.

moralestapia · 2026-02-20T12:13:53 1771589633

Thanks for having a brain.

Not sure who started that "split into 10 chips" claim, it's just dumb.

This is Llama 3B hardcoded (literally) on one chip. That's what the startup is about, they emphasize this multiple times.

aurareturn · 2026-02-20T12:51:28 1771591888

It’s just dumb to think that one chip per model is their plan. They stated that their plan is to chain multiple chips together.

I was indeed wrong about 10 chips. I thought they would use llama 8B 16bit and a few thousand context size. It turns out, they used llama 8B 3bit with around 1k context size. That made me assume they must have chained multiple chips together since the max SRAM on TSMC n6 for reticle sized chip is only around 3GB.

elternal_love · 2026-02-20T11:47:29 1771588049

Were we go towards really smart roboters. It is interesting what kind of diferent model chips they can produce.

varispeed · 2026-02-20T11:50:42 1771588242

There is nothing smart about current LLMs. They just regurgitate text compressed in their memory based on probability. None of the LLMs currently have actual understanding of what you ask them to do and what they respond with.

adamtaylor_13 · 2026-02-20T13:53:13 1771595593

If LLMs just regurgitate compressed text, they'd fail on any novel problem not in their training data. Yet, they routinely solve them, which means whatever's happening between input and output is more than retrieval, and calling it "not understanding" requires you to define understanding in a way that conveniently excludes everything except biological brains.

fennecbutt · 2026-02-22T02:02:03 1771725723

I somewhat agree with you but I also realise that there are very few "novel" problems in the world. I think it's really just more complex problem spaces is all.

Same relative logic, just more of it/more steps or trials.

sfn42 · 2026-02-20T15:43:03 1771602183

Yes there are some fascinating emergent properties at play, but when they fail it's blatantly obvious that there's no actual intelligence nor understanding. They are very cool and very useful tools, I use them on a daily basis now and the way I can just paste a vague screenshot with some vague text and they get it and give a useful response blows my mind every time. But it's very clear that it's all just smoke and mirrors, they're not intelligent and you can't trust them with anything.

pennomi · 2026-02-20T16:01:31 1771603291

When humans fail a task, it’s obvious there is no actual intelligence nor understanding.

Intelligence is not as cool as you think it is.

swftarrow · 2026-02-21T12:45:28 1771677928

It can still be cool- but maybe it's just not as rare.

sfn42 · 2026-02-20T16:13:10 1771603990

I assure you, intelligence is very cool.

atomicthumbs · 2026-02-21T02:20:44 1771640444

you'd think with how often Opus builds two separate code paths without feature parity when you try to vibe code something complex, people wouldn't regard this whole thing so highly

otabdeveloper4 · 2026-02-20T19:47:11 1771616831

> they'd fail on any novel problem not in their training data

Yes, and that's exactly what they do.

No, none of the problems you gave to the LLM while toying around with them are in any way novel.

adamtaylor_13 · 2026-02-20T21:01:42 1771621302

None of my codebases are in their training data, yet they routinely contribute to them in meaningful ways. They write code that I'm happy with that improves the codebases I work in.

Do you not consider that novel problem solving?

otabdeveloper4 · 2026-02-22T09:56:28 1771754188

Correct, you are not doing any novel problem solving.

varispeed · 2026-02-20T14:03:03 1771596183

They don't solve novel problems. But if you have such strong belief, please give us examples.

ainch · 2026-02-20T20:04:52 1771617892

Depends how precisely you define novel - I don't think LLMs are yet capable of posing and solving interesting problems, but they have been used to address known problems, and in doing so have contributed novel work. Examples include Erdos Problem #728[0] (Terence Tao said it was solved "more or less autonomously" by an LLM), IMO problems (Deepmind, OpenAI and Huang 2025), GPT-5.2 Pro contributing a conjecture in particle physics[1], systems like AlphaEvolve leveraging LLMs + evolutionary algorithms to generate new, faster algorithms for certain problems[2].

[0] https://mathstodon.xyz/@tao/115855840223258103

[1] https://huggingface.co/blog/dlouapre/gpt-single-minus-gluons

[2] https://deepmind.google/blog/alphaevolve-a-gemini-powered-co...

bsenftner · 2026-02-20T12:43:48 1771591428

We know that, but that does not make them unuseful. The opposite in fact, they are extremely useful in the hands of non-idiots.We just happen to have a oversupply of idiots at the moment, which AI is here to eradicate. /Sort of satire.

visarga · 2026-02-20T13:47:28 1771595248

So you are saying they are like copy, LLMs will copy some training data back to you? Why do we spend so much money training and running them if they "just regurgitate text compressed in their memory based on probability"? billions of dollars to build a lossy grep.

I think you are confused about LLMs - they take in context, and that context makes them generate new things, for existing things we have cp. By your logic pianos can't be creative instruments because they just produce the same 88 notes.

flamedoge · 2026-02-21T07:02:27 1771657347

I have a gut feeling, huge portion of deficiencies we note with AI is just reflection of the training data. For instance, wiki/reddit/etc internet is just a soup of human description of the world model, not the actual world model itself. There are gaps or holes in the knowledge because codified summary of world is what is remarkable to us humans, not a 100% faithful, comprehensive description of the world. What is obvious to us humans with lived real world experience often does not make it into the training data. A simple, demonstrable example is whether one should walk or drive to car wash.

small_model · 2026-02-20T12:04:50 1771589090

Thats not how they work, pro-tip maybe don't comment until you have a good understanding?

fyltr · 2026-02-20T12:23:02 1771590182

Would you mind rectifying the wrong parts then?

retsibsi · 2026-02-20T12:47:16 1771591636

Phrases like "actual understanding", "true intelligence" etc. are not conducive to productive discussion unless you take the trouble to define what you mean by them (which ~nobody ever does). They're highly ambiguous and it's never clear what specific claims they do or don't imply when used by any given person.

But I think this specific claim is clearly wrong, if taken at face value:

> They just regurgitate text compressed in their memory

They're clearly capable of producing novel utterances, so they can't just be doing that. (Unless we're dealing with a very loose definition of "regurgitate", in which case it's probably best to use a different word if we want to understand each other.)

mhl47 · 2026-02-20T12:48:12 1771591692

The fact that the outputs are probabilities is not important. What is important is how that output is computed.

You could imagine that it is possible to learn certain algorithms/ heuristics that "intelligence" is comprised of. No matter what you output. Training for optimal compression of tasks /taking actions -> could lead to intelligence being the best solution.

This is far from a formal argument but so is the stubborn reiteration off "it's just probabilities" or "it's just compression". Because this "just" thing is getting more an more capable of solving tasks that are surely not in the training data exactly like this.

100721 · 2026-02-20T12:31:59 1771590719

Huh? Their words are an accurate, if simplified, description of how they work.

fragmede · 2026-02-21T16:08:45 1771690125

The simplification is where it loses granularity. I could describe every human's life as they were born and then they died. That's 100% accurate, but there's just a little something lost by simplifying that much.

beyondCritics · 2026-02-20T12:40:48 1771591248

Just HI slop. Ask any decent model, it can explain what's wrong this this description.

Aissen · 2026-02-20T13:12:42 1771593162

> 880mm^2 die

That's a lot of surface, isn't it? As big an M1 Ultra (2x M1 Max at 432mm² on TSMC N5P), a bit bigger than an A100 (820mm² on TSMC N7) or H100 (814mm² on TSMC N5).

> The larger the die size, the lower the yield.

I wonder if that applies? What's the big deal if a few parameter have a few bit flips?

rbanffy · 2026-02-20T13:17:07 1771593427

> I wonder if that applies? What's the big deal if a few parameter have a few bit flips?

We get into the sci-fi territory where a machine achieves sentience because it has all the right manufacturing defects.

Reminds me of this https://en.wikipedia.org/wiki/A_Logic_Named_Joe

sowbug · 2026-02-20T14:32:22 1771597942

Also see Adrian Thompson's Xilinx 6200 FPGA, programmed by a genetic algorithm that worked but exploited nuances unique to that specific physical chip, meaning the software couldn't be copied to another chip. https://news.ycombinator.com/item?id=43152877

rbanffy · 2026-02-20T16:44:21 1771605861

I love that story.

philipwhiuk · 2026-02-20T13:21:47 1771593707

2000s movie line territory:

> There have always been ghosts in the machine. Random segments of code, that have grouped together to form unexpected protocols.

empath75 · 2026-02-20T13:45:02 1771595102

An on-device reasoning model what that kind of speed and cost would completely change the way people use their computers. It would be closer to star trek than anything else we've ever had. You'd never have to type anything or use a mouse again.

xnx · 2026-02-20T22:05:34 1771625134

Hardware decoders make sense for fixed codecs like MPEG, but I can't see it making sense for small models that improve every 6 months.

WhitneyLand · 2026-02-20T13:44:04 1771595044

There’s a bit of a hidden cost here… the longevity of GPU hardware is going to be longer, it’s extended every time there’s an algorithmic improvement. Whereas any efficiency gains in software that are not compatible with this hardware will tend to accelerate their depreciation.

gwern · 2026-02-21T05:29:46 1771651786

K-V caches are large, but hidden states aren't necessarily that large. And if you can run a model once ridiculously fast, then you can loop it repeatedly and still be fast. So I wonder about the 'modern RNNs' like RWKV here...

make3 · 2026-02-22T04:22:40 1771734160

It's weird to me to train such huge models to then destroy them by using them a 3 bits quantization per presumably 16bits (bfloat16) weights. Why not just train smaller models then.

pankajdoharey · 2026-02-21T07:30:34 1771659034

There is nothing new here. This has been demonstrated several times by previous researchers:

https://arxiv.org/abs/2511.06174

https://arxiv.org/abs/2401.03868

For a real world use case, you would need an FPGA with terabytes of RAM. Perhaps it'll be a Off chip HBM. But for s large models, even that won't be enough. Then you would need to figure out NV-link like interconnect for these FPGAs. And we are back to square one.

smokel · 2026-02-21T14:37:35 1771684655

This is new. You are citing FPGA prototypes. Those papers do not demonstrate the same class of scaling or hardware integration that Taalas is advocating. For one, the FPGA solutions typically use fixed multipliers (or lookup tables), the ASIC solution has more freedom to optimize routing for 4 bit multiplication.

pankajdoharey · 2026-02-25T04:50:38 1771995038

I understand that what Taalas is claiming. I was trying to actually describe that model on a hardware is some not something new Or unthought of The natural progression of FPGA is ASIC. Taalas process is more expensive And not really worth it because once you burn a model on the silicon, the silicon can only serve that model. speed improvement alone is not enough for the cost you will incur in the long run. GPU's are still general purpose, FPGA's are atleast reusable but wont have the same speed. But this alone cannot be a long term business. Turning a model to hardware in two months is too long. Models already take quite a long time to train. Anyone going down this strategy would leave wide open field to their competitors. Deployment planning of existing models already so complicated.

bsenftner · 2026-02-20T12:42:04 1771591324

Do not overlook traditional irrational investor exuberance, we've got an abundance of that right now. With the right PR manouveurs these guys could be a tulip craze.

oliwary · 2026-02-20T11:29:51 1771586991

This is insane if true - could be super useful for data extraction tasks. Sounds like we could be talking in the cents per millions of tokens range.

pulse7 · 2026-02-21T18:54:36 1771700076

Maybe they can stack LLM parameters in 200 layers like 3D NAND flash and make the chip very small ...

mikhail-ramirez · 2026-02-20T14:37:15 1771598235

Yea its fast af but very quickly loses context/hallucinates from my own tests with large chunks of text

Tepix · 2026-02-20T13:11:58 1771593118

Doesn't the blog state that it's now 4bit (the first gen was 3bit + 6bit)?

robotnikman · 2026-02-20T18:15:48 1771611348

Sounds perfect for use in consumer devices.

zozbot234 · 2026-02-20T11:57:27 1771588647

Low-latency inference is a huge waste of power; if you're going to the trouble of making an ASIC, it should be for dog-slow but very high throughput inference. Undervolt the devices as much as possible and use sub-threshold modes, multiple Vt and body biasing extensively to save further power and minimize leakage losses, but also keep working in fine-grained nodes to reduce areas and distances. The sensible goal is to expend the least possible energy per operation, even at increased latency.

dust42 · 2026-02-20T12:09:07 1771589347

Low latency inference is very useful in voice-to-voice applications. You say it is a waste of power but at least their claim is that it is 10x more efficient. We'll see but if it works out it will definitely find its applications.

zozbot234 · 2026-02-20T12:18:29 1771589909

This is not voice-to-voice though, end-to-end voice chat models (the Her UX) are completely different.

dust42 · 2026-02-20T12:29:01 1771590541

I haven't found any end-to-end voice chat models useful. I had much better results with separate STT-LLM-TTS. One big problem is the turn detection and having inference with 150-200ms latency would allow for a whole new level of quality. I would just use it with a prompt: "You think the user is finished talking?" and then push it to a larger model. The AI should reply within the ballpark of 600ms-1000ms. Faster is often irritating, slower will make the user to start talking again.

PhunkyPhil · 2026-02-20T18:19:17 1771611557

I think it's really useful for agent to agent communication, as long as context loading doesn't become a bottleneck. Right now there can be noticeable delays under the hood, but at these speeds we'll never have to worry about latency when chain calling hundreds or thousands of agents in a network (I'm presuming this is going to take off in the future). Correct me if I'm wrong though.

dust42 · 2026-02-20T07:33:37 1771572817

Interesting hardware but I wonder if it is capable of KV caching. Thus (only) useful for applications that have short context but would benefit from very low latency. Voice-to-voice applications may be a good example.

dust42 · 2026-02-17T11:49:02 1771328942

Interesting stuff but it hurts so much that the writer has the common misconception of pavlov's dog doing a circus trick. Sure the dog also consciously understands the connection between bell and food. But the physiological reaction of the saliva flowing is not a conscious decision of the dog. Circus tricks with animals existed long before Pavlov. The key discovery is that there is a physiological reaction which cannot be suppressed anymore consciously. That's why PTSD is such a bitch to be treated: even with the stimulus gone, the physiological reaction remains.

That said, the article is still worth a read.

topocite · 2026-02-17T12:21:43 1771330903

The article just reminds me that I hate modern journalism and try to not read any news articles.

Hyperbolic attention grabbing headline followed by appeal to authority, appeal to authority, appeal to authority, counter opinion appeal to authority that the previous appeal to authority might all be wrong.

So wide reaching and all over the place, the reader and can pick from the menu on what point they want to use as confirmation of what they already believe to be true. Then the article can be cited in a type of scientistic, mostly wrong, gossip.

IMO a complete waste of time.

fasterik · 2026-02-17T14:45:36 1771339536

You shouldn't conflate a pop science magazine with all of modern journalism. Try a high quality outlet like The Economist. "I try not to read any news articles" screams anti-intellectualism.

mmmBacon · 2026-02-17T16:46:37 1771346797

The GP has a point about the state of journalism generally and the pervasive nature in which Yellow journalism is returning.

One need not be anti-intellectual to find the state of reporting to be difficult to deal with and not wanting to read it. In addition to the GP’s complaint; journalists of any ilk also tend to conflate editorializing with reporting. You see this all the way from pop science to NYTimes to Fox News and yes even the Economist.

A question is whether the more fact based reporting of the early-mid 20th Century is the exception to the tendency of Yellow journalism that existed before and seems to exist now.

polishdude20 · 2026-02-17T13:01:04 1771333264

I see the same thing with YouTube videos. I catch myself watching and afterwards being like "that was a load of wasted time"

shevy-java · 2026-02-17T13:30:06 1771335006

I think it depends. While AI has flooded YouTube and further degraded its quality, some channels are still useful (or can be). Daily Dose of Internet is still semi-ok, as one example, though I also noticed I have fatigued quite a lot lately - too much time wasted on youtube in general.

polotics · 2026-02-17T14:15:06 1771337706

Yes, a common issue now with Youtube content, enormous variability in quality of content. Gemini does a good-enoug job of debunking Youtube transcript, and I use that when I have a doubt, but clearly will all the slp I get sent by well-meaning YouTube-watching acquaintances, I don't want to butn too many tokens on that treadmill... I wonder how man Terms & Conditions of use some distributed debunk-data repository for videos would cross? Users vetted by hckrnews-karma checks posting "this video is bunk because"... Would be a real boon.

jeffreygoesto · 2026-02-17T14:48:04 1771339684

Ah come on, Drumeo is excellent entertainment with cocktail party knowledge bits here and there. ;)

shermantanktop · 2026-02-17T15:45:35 1771343135

I love those Drumeo challenges. I don’t even play drums. But watching creative people who are excellent at their craft solve an unknown problem in a new way - when we are all familiar with the original solution - is fascinating.

chrisjj · 2026-02-17T13:00:25 1771333225

> IMO a complete waste of time.

That's the new New Scientist entire. The mag is now pap for non-scientists.

antonvs · 2026-02-17T15:12:05 1771341125

Conflating New Scientist with all modern journalism is a category error. New Scientist has been a zombie mag for going on two decades at this point. As with many magazines, the internet killed it.

perfmode · 2026-02-18T15:40:36 1771429236

You say the dog “also consciously understands the connection between bell and food,” which is actually not something Pavlov’s framework establishes at all. Whether the dog has conscious understanding of the association is exactly the kind of claim Pavlov’s behaviorist approach was designed to avoid. We can observe the salivation; we can’t observe the dog’s understanding.

The PTSD analogy is intuitively appealing but also somewhat off. You say “even with the stimulus gone, the physiological reaction remains,” but PTSD is better characterized by impaired extinction and stimulus generalization (too many things become triggers), not by the response persisting in the total absence of any triggering stimulus. The difficulty of treating PTSD has as much to do with how the fear memory is consolidated and how extinction learning fails to transfer across contexts as it does with the simple involuntary nature of the response.

gigatexal · 2026-02-17T13:28:48 1771334928

> That's why PTSD is such a bitch to be treated: even with the stimulus gone, the physiological reaction remains.

Helping a friend with cPTSD and this is so true! It’s such a hard thing to overcome. By helping I mean I’m helping pay for counseling and therapy not that I’m doing it cuz I’m hella unqualified.

raincole · 2026-02-17T12:28:03 1771331283

Perhaps I failed to read the nuances between lines, but I don't find the article contradicting what you said.

dust42 · 2026-02-17T14:52:32 1771339952

"It showed that dogs process information from their environment and use it to make predictions"

Exactly that is not what the experiment is about because we all know that dogs will quickly learn the connection between bell and food as dogs are easy to teach new tricks.

If you replace 'dogs' with 'humans', it becomes an empty phrase: "It showed that humans process information from their environment and use it to make predictions" - we all know that.

The groundbreaking part of the experiment was that it showed there are responses which are not part of the conscious mind and which are not willingly controllable by the conscious mind. The dog did not 'decide' to produce saliva.

The experiment was done with a dog because obviously you wont find humans willing to undergo surgery to have the saliva come out of the cheeks instead of into the mouth.

One has to forget about the dog and mentally replace it with a human: now the observation that the human connects the bell with the food is shallow. But the conditioned saliva reflex remains and can't be suppressed - and that is a remarkable insight. It works both with negative and positive stimuli. The latter one being a recipe for a long-lasting happy relationship ;)

perfmode · 2026-02-18T15:46:05 1771429565

The claim that the conditioned salivary response “can’t be suppressed” keeps getting repeated, and it’s still wrong. Pavlov’s own experiments demonstrated extinction. The conditioned response does weaken and disappear when you stop pairing the bell with food. It’s resistant to conscious override in real time, sure, but it’s not permanent or unsuppressable in the way you keep implying.

The framing that Pavlov’s key insight was about “responses which are not part of the conscious mind” is historically a bit anachronistic. Pavlov was a physiologist, not a psychologist. He wasn’t primarily making claims about consciousness vs. unconsciousness. He was mapping the mechanics of reflexes and their modification. The conscious/unconscious framing is more of a later psychological interpretation layered on top. Pavlov himself was pretty hostile to mentalistic language and would probably have objected to framing his work in terms of what the “conscious mind” does or doesn’t control.

The claim about surgery is also a bit misleading as a reason for using dogs. Pavlov used dogs because he was a digestive physiologist studying gastric secretion. The surgical fistula was created so he could measure salivation precisely, not because the experiment required it conceptually. You could (and people later did) study conditioned physiological responses in humans through things like galvanic skin response, heart rate changes, and eye-blink conditioning, none of which require surgery.

So you have a legitimate complaint about a shallow reading of Pavlov, but you’re building their correction on some inaccuracies of your own, and you keep asserting the “can’t be suppressed” point which is the weakest part of your argument.

raincole · 2026-02-17T15:25:55 1771341955

> he groundbreaking part of the experiment was that it showed there are responses which are not part of the conscious mind and which are not willingly controllable by the conscious mind.

That's... interesting. How did they know that? Did they interview the dogs and ask them if they actively and consciously decide to produce saliva? Did they ask the dogs to try to surpass the reflex and the dogs failed to do it? Is "dogs have human-like conscious mind" even a scientific consensus?

estearum · 2026-02-17T16:38:50 1771346330

Your opposing theory is that the dogs consciously chose to salivate?

raincole · 2026-02-17T16:57:03 1771347423

The parent commenter's claim is:

> The key discovery is that there is a physiological reaction which cannot be suppressed anymore consciously.

My opposing theories are

1. dogs don't have conscious minds that are similar to humans' so the whole experiment can't be extrapolated to humans

or

2. dogs can suppress it consciously if they really want, like we can suppress the 'hanger reflex', it's just we don't have a way to tell dogs to do that

I really don't know how Pavlov experiment nullified these theories, and if it did, why "training animals to do circus tricks" didn't. Are we sure 'doing circus tricks' is equal to consciousness, and how?

dust42 · 2026-02-17T18:15:56 1771352156

Actually Pavlov did research about the digestive system for which he got the Nobel prize of medicine a few years earlier.

> Did they interview the dogs and ask them if they actively and consciously decide to produce saliva? > Is "dogs have human-like conscious mind" even a scientific consensus?

That's exactly the point - once you have understood the significance of the experiment you understand that it is not important:

A veteran with PTSD can have a surge in adrenaline, heart rate, and cortisol when hearing a car backfiring but he can not suppress it.

Whether the dog was conscious or not about the salivation is completely and utterly irrelevant. In 1907 this was for the first time evidence of a mind-body connection not being accessible to the consciousness. Seriously, forget about the dog. This is all proven beyond any doubt for conscious humans. Nobody cares about what the dog felt.

Associative learning was already known at that time which in its simple form is just circus tricks. The experiment extended this to physiological responses which are not accessible to consciousness in humans.

moi2388 · 2026-02-18T05:12:18 1771391538

That’s not the point at all. It’s not about consciousness or being able to suppress it, or for example neurofeedback training or exposure therapy wouldn’t work either.

It’s about transference of an innate stimulus response mechanic which can be transferred to another stimulus if paired in quick succession of the original stimulus, thereby eliciting the same response.

It says absolutely nothing about this being conscious or not, or impossible to suppress.

GoblinSlayer · 2026-02-17T21:30:59 1771363859

Isn't it because stimulus is fast? A slow stimulus like arachnophobia is manageable.

nextaccountic · 2026-02-17T13:03:15 1771333395

Can you quote what part of the article has this misconception?

dust42 · 2026-02-11T18:08:36 1770833316

They are all just token generators without any intelligence. There is so little difference nowadays that I think in a blind test nobody will be able to differentiate the models - whether open source or closed source.

Today's meme was this question: "The car wash is only 50 meters from my house. I want to get my car washed, should I drive there or walk?"

Here is Claude's answer just right now:

"Walk! At only 50 meters (about 150 feet), it would take you less than a minute to walk there. Driving such a short distance would actually be less convenient - by the time you get in the car, start it, drive, and park, you could already be there on foot. Plus, you'd save fuel and your car would be right there waiting for you after the wash is done."

Here is ChatGPT, also right now:

"Since it’s only *50 meters* (about 160 feet), walking makes a lot more sense.

Here’s why:

* *It’s extremely close* — you’ll get there in under a minute. * *Driving would actually be less efficient* — by the time you start the car, move it, and park, you’re already there. * *Cold starts aren’t great for your engine* — very short drives don’t let the engine warm up properly. * *It saves fuel and emissions*, even if it’s a tiny amount. * You avoid the hassle of maneuvering such a short distance.

The only time driving might make sense is if:

* The weather is terrible (heavy rain, snow, extreme cold). * You have mobility issues. * The car wash requires you to stay in the vehicle and you don’t want to leave it there.

Otherwise, this is a perfect “walk it over” situation. "

Please save us all that talk about frontier and SOTA and that only the closedAI models are any good and the others are all so bad and benchmaxxed. For most purposes a Toyota is just as good as a BMW or a Merc or whatever luxury brand tickles your fancy. Even worse, the lastest 80B Qwen Next is not far from Opus 4.6 but runs on my laptop.

cherryteastain · 2026-02-11T18:23:11 1770834191

Gemini 3 Flash ("Fast" in the web app) did not have trouble with this:

Since you presumably want the car to be clean, the answer depends entirely on how the car is getting washed. The Breakdown If it’s a Professional Car Wash: You’ll need to drive. Bringing the car to the wash is generally the prerequisite for them washing it. Plus, walking 50 meters just to ask them to come pick up a car that is visible from their front door might result in some very confused looks from the staff. If it’s a Self-Service Pressure Washer: Again, drive. Dragging a 1,500kg vehicle 50 meters by hand is an incredible workout, but probably not the "quick chore" you had in mind. If you are hiring a Mobile Detailer: Walk. Stroll over there, tell them where you're parked, and walk back home to watch TV while they do the work in your driveway.

j-bos · 2026-02-11T18:49:24 1770835764

GLM 4.7: "You should drive.

Since the goal is to get your car washed, the car needs to be at the car wash. If you walk, you will arrive at the car wash, but your car will still be sitting at home"

Are you sure that question is from this year?

haute_cuisine · 2026-02-11T18:11:25 1770833485

Doesn't seem to be the case, gpt 5.2 thinking replies: To get the car washed, the car has to be at the car wash — so unless you’re planning to push it like a shopping cart, you’ll need to drive it those 50 meters.

Scene_Cast2 · 2026-02-11T18:12:30 1770833550

I just ran this with Gemini 3 Pro, Opus 4.6, and Grok 4 (the models I personally find the smartest for my work). All three answered correctly.

miroljub · 2026-02-11T19:47:31 1770839251

They had plenty of time to update their system prompts so they don't be embarrassed.

I noticed whenever such meme comes out, if you check immediately you can reproduce it yourself, but after a free hours it's already updated.

viraptor · 2026-02-11T21:05:00 1770843900

I think you're seriously underestimating how much effort the fine tuning at their scale takes and what impact it has. They don't pack every edge case into the system prompt either. It's not like they update the model every few hours or even care about memes. If they seriously did, they'd force-delegate spelling questions to tool calls.