Hacker Newsnew | past | comments | ask | show | jobs | submit | bitpush's commentslogin

It isnt speed you want. It is storage. Faster CPU doesnt mean you can store a TB model. It needs raw storage, which famously is through the roof.

So unless iPhone 20 Pro Max has 100GB of unifieid memory all of this is just pipe-dream. I mean, it wont even have 32GB of unified memory.


Have you tried running a reasonably sized model locally? You need minimum 24GB VRAM to load up a model. 32GB to be safe, and this isnt even frontier, but bare minimum.

A good analogy would be streaming. To get good quality, sure, you can store the video file but it is going to take up space. For videos, these are 2-4GB (lets say) and streaming will always be easier and better.

For models, we're looking at 100s of GB worth of model params. There's no way we can make it into, say, 1GB without loss in quality.

So nope, beyond minimal classification and such, on-device isnt happening.

--

EDIT:

> Nobody wants to be sending EVERY request to someone else's cloud server.

We do this already with streaming. You watch YouTube that is hosting videos on the "cloud". For latest MKBHD video, I dont care about having that locally (for the most part). I just wanna watch the video and be done with it.

Same with LLMs. If LLMs are here to stay, most people would wanna use the latest / greatest models.

---

EDIT-EDIT:

If you response is Apple will figure it out somehow. Nope, Apple is sitting out the AI race. So it has no technology. It has nothing. It has access to whatever open source is available or something they can license from rest. So nope, Apple isnt pushing the limits. They are watching the world move beyond them.


I think this is very pessimistic. Yes, big models are "smarter" and have more inherent knowledge but I'd bet you a coffee that what 99% of people want to do with Siri isn't "Write me an essay on the history of textiles" or "Vibe code me a SPA", rather it's "Send Mom the pictures I took of the kids yesterday" and "Hey, play that Deadmau5 album that came out a couple years back" which is more about tool calls than having wikipedia-level knowledge built in to the model.

> Hey, play that Deadmau5 album that came out a couple years back

It could work for Deadmau5 because it is probably popular enough to be part of the model. How about "Hey, play that $regional_artist's cover of Deadmau5" and the model needs to know about "regional_artist", the concept of "cover", where those remixes might be (youtube? soundcloud? some other place).

All of a sudden, it all breaks down. So it'll work for "turn off porch lights", but not for "turn off the lights that's in the front of the house"


As long as it can run tool calls it won't "break down," not sure why you think the LLM would be searching within its own training data rather than calling the Spotify API or MCP to access that specific artist and search through for the song id of the cover.

*deadmau5

> You need minimum 24GB VRAM to load up a model. 32GB to be safe, and this isnt even frontier, but bare minimum.

Indeed.

But they said 5 years. That's certainly plausible for high-end mobile devices in Jan 2031.

I have high uncertainty on if distillation will get Opus 4.6-level performance into that RAM envelope, but something interesting on device even if not that specifically, is certainly within the realm of plausibility.

Not convinced Apple gets any bonus points in this scenario, though.


Have you ran models locally, especially on the phone? I have, and there are even apps like Google AI Edge Gallery that runs Gemma for you. It works perfectly fine for use cases like summarizing emails and such, you don't really need the latest and greatest (ie biggest) models for tasks like these, in much the same way more people do not need the latest and greatest phone or laptop for their use cases.

And anyway, you already see models like Qwen 3.5 9B and 4B beating 30B and 80B parameter models, which can already run on phones today, especially with quantization.

Benchmarks: https://huggingface.co/Qwen/Qwen3.5-4B


I'm going by what features Apple advertisement showed in the iPhone 16 ad. Take a phone out, and point at a restuarant and ask it to a) analyze the video/image b) understand what's going on

Or pull out the phone and ask "Who's the person I met on X day ..".


Sure, many local models can do all that today already, as they have vision and tool calling support.

>> So nope, beyond minimal classification and such, on-device isnt happening.

This is a paradox right? Handset makers want less handset storage so they can get users to buy more of their proprietary cloud storage while at the same time wanting them to use their AI more frequently on their handsets.

It will be interesting which direction they decide to go. Finding a phone in the last few years with more than 256gb storage is not only expensive AF, its become more of a rarity than commonplace. Backtracking on this model in order to simply get AI models on board would be a huge paradigm shift.


If all of the storage is used up by models, users will need to buy proprietary cloud storage for their own content.

Streaming video is almost exclusively pull. The only data you're sending up to the server is what you're watching, when you seek, pause, etc.

Useful LLM usage involves pushing a lot of private data into them. There's a pretty big difference sending up some metadata about your viewing of an MKBHD video, and asking an LLM to read a text message talking about your STD test results to decide whether it merits a priority notification. A lot of people will not be comfortable with sending the latter off to The Cloud.


5 years ago , LLM was "beyond minimal conversation, intelligence isn't happening".

I'm pretty sure in five years, local LLM will be a thing.


I think there's also laws of physics based on the current architecture. Its like saying looking at a 10GB video file and saying - it has to compress to 500MB right? I mean, it has to - right?

Unless we invent a completely NEW way of doing videos, there's no way you can get that kind of efficiency. If tomorrow we're using quantum pixels (or something), sure 500MB is good enough but not from existing.

In other words, you cannot compress a 100GB gguf file into .. 5GB.


There surely are limits, but I don't think we have a good idea of what those are, and there's nothing to indicate we're anywhere close to them. In terms of raw facts, you can look at the information content and know that you need at least that many bits to represent that knowledge in the model. Intelligence/reasoning is a lot less clear.

100GB to 5GB would be 20x. Video has seen an improvement of that magnitude in the days since MPEG-1.

It's interesting to consider that improvements in video codecs have come from both research and massively increased computing power, basically trading space for computation. LLMs are mostly constrained by memory bandwidth, so if there was some equivalent technique to trade space for computation in LLM inference, that would be a nice win.


If you have good performance storage, you don't need to keep all your params in VRAM. The big datacenter-scale providers do it for peak performance/throughput, but locally you're better off (at least for the largest models) letting them sit on storage and accessing them on demand.

> Nope, Apple is sitting out the AI race.

That's why I use an iPhone. I don't need and I don't want any "AI" in my phone. The claims that people want it comes from CEOs, marketers and influencers of GenAI companies, not from users.


Got some bad news for you. Apple sitting out the "AI" race is just a skill issue. In fact they're buying their way back in:

https://www.reuters.com/business/google-apple-enter-into-mul...


I kind of get it, they push it down my throat on every minor upgrade, "Do you want Apple intelligence?" - but there is no button "Why would I?".

>Apple is sitting out the AI race

Then why does my M4 run models at TOK/s that similar priced GPUs cannot?


From TFA:

  For Private Cloud Compute specifically, the system is described as underpowered and perhaps more trouble than it’s worth. Updating the software is apparently trickier and takes time, and more fundamentally the chips (believed to comprise right now of modified M2 Ultra processors) are not powerful enough to run the latest frontier models like Gemini, which the new Siri will be based on.

> M2 Ultra processors ... are not powerful enough to run the latest frontier models

The local AI community would strongly disagree with that assessment. They may not be able to run them with low latency for interactive use and this is most likely the real blocker for them, but they will have strong compute per watt compared to nVidia GPU's.


You cropped the part of the quote that is relevant:

> like Gemini, which the new Siri will be based on.

The local AI community isn't evaluating the internal Gemini models. Apple's Private Compute hardware is specifically competing against Google's TPU hardware, which is a foregone conclusion if you've seen the inference economics. The money and electricity wasted on Mac inference at that scale isn't even attractive to Apple.


iPhones can run Uber app but nobody would claim Apple is in the ride sharing business.

No, but they are in the "Device that runs apps" business right? Just like they're looking to corner the "Device that runs models locally" business by focusing on onboard inference.

Gains in model performance isn't exactly cheap, and once one frontier model figures it out, the rest seem to copy it quick. Let them figure out what works and what doesnt, then put the "Apple" touch on it, all while putting your devices in everyone's hands. That's been their business model for years.


The insistence/assumption that llm models will consistently get better, smaller, and cheaper is so annoying. These things fundamentally require lots of data and lots of processing power. Moore's Law is dead; devices aren't getting exponentially faster anymore. RAM and SSDs are getting more expensive (thanks to this insane bubble).

> RAM and SSDs are getting more expensive (thanks to this insane bubble).

That's not a matter of Moore's Law failing, but short-term capacity constraints being hit. It's actually what you want if Moore's Law is to keep going. It's a blessing in disguise for the industry as a whole.


Computing power still has practically flatlined. Memory density is decelerating in its improvement. My point still stands despite the temporary pricing situation.

> Computing power still has practically flatlined

Single-threaded compute, maybe - but that's increasingly a niche. Highly parallel workloads are still going strong in the latest device nodes, and power use for any given workload is decreasing significantly.


For vibe coding? Sure. For "Hey Siri, send Grandma an e-mail summarizing my schedule this afternoon."? No.

Apple not just bent the knee, but also presented a golden plaque to go along with it. Yuck

This. I kept scrolling to find the new version, and couldnt believe that's where they landed on.

It doesnt .. look very new?


Remember how Apple made a big ad about Mother Nature ad.

Yup, it was merely to tug heart strings to sell phones.

Disappointing from Apple.


> I don't knowingly use AI

> Sometimes I will use Kagi's "assistant" model whilst coding. Particularly to clean up existing code/stylesheets

The only moral abortion is my abortion.


Kagi's models are also incredibly bad. I can't imagine how this person believes they are getting fair value from them.

Kagi doesn't make any models in-house - they use closed-source frontier models and OSS ones hosted by third-party providers. The former are on par with their own vendors' chat interface implementations (capabilities like file upload and custom tool use excepted).

Kagi assistant includes all Anthropic, Google, OpenAI and Grok models as well as all the common open weights models.

The subscription pays for competent search. Anything else is gravy

The trick is to say ignorant. If you know nothing about AI of course you're not knowingly using AI.

> Recently, though, Apple acquired a new startup for $2 billion: Q.ai.

> While we don’t know loads about this company, we do know one thing quite clearly – it specialized in machine learning systems for interpreting silent voice input.

> Right now, if you want to speak to a voice assistant, it has to be pretty audible. Even whispers can be hard at time for certain voice models, especially when you aren’t in a completely silent environment. This new acquisition could solve that.


Its always fascinating that HN crowd seems to be blind to Apple's very obvious transgressions.

Even the article makes the mistake. They paint every company with a broad brush ("all AI companies are ad companies") but for Apple they are more sympathetic "We can quibble about Apple".

Apple's reality distort field is so strong. People still think they are not in ad business. People still think they stand up to government, and folks chose to ignore hard evidence (Apple operates in China on CCP's pleasure. Apple presents a gold plaque to President Trump to curry favors and removes ICEBlock apps ..) There's no pushback, there's no spine.

Every company is disgusting. Apple is hypocritical and disgusting.


What tech company allows this at workplace these days? How's Apple these days?


Pretty much all the big tech companies allow this. It's just that they are pro-ICE and pro-current administration in the workplace speech. Their CEOs have already bent the knee, made the tens of millions to half billions in donations to the Trump family, and expect everyone under them not to undermine their sycophancy.

It's a real problem with the tech industry. Repeating IBM's WWII mistakes. https://en.wikipedia.org/wiki/IBM_and_the_Holocaust


Most US companies are run like tiny little fascist dictatorships, which is a great training ground for the real thing. Contrast eg Norway, where businesses operate inside of a formal 3-way agreement (Trepartssamarbeidet) between the government, employers associations, and trade unions.

It's going to take probably a few rounds of fascism and many millions dead before Americans widely decide to change the fundamental nature of business.


Capitalist companies are just the modern day evolution of feudal lords with the mandate of maximum value extraction with zero care for the impacts on your fiefdom/market, the tendency to drift to rent seeking, etc.

Instead of Normans organizing a raiding party for lands, you have Normans renamed as Capitalists organizing a raiding party for a target market, just now in the modern world they also get to offload the burdens of population governance, risk, infrastructure, housing, healthcare, retirement security, training, and social stability.

The Capitalists even named their raiding party using the military term 'Company' and organize its power structure in the same militaristic way, with the leader accountable to them and not the people under/the workers.


> Capitalist companies are just the modern day evolution of feudal lords with the mandate of maximum value extraction with zero care for the impacts on your fiefdom/market, the tendency to drift to rent seeking, etc.

Actually, I'd argue capitalist companies are often worse, because at least feudal lord has more interest in ensuring the long-term viability of their fief. In capitalism its not uncommon for some investor to come in, wreck the place's long-term viability for short-term profit, then sell before the other shoe drops.


> Contrast eg Norway, where businesses operate inside of a formal 3-way agreement (Trepartssamarbeidet) between the government, employers associations, and trade unions.

> It's going to take probably a few rounds of fascism and many millions dead before Americans widely decide to change the fundamental nature of business.

I don't agree. I think what's needed is to break the delusion that every American is a capitalist-lord-in-waiting, so they delusionally think and vote in ways that harm their interests (see, software engineers excited about AI and advocating for its adoption).

Also cut down the noise. The "culture wars" (from both sides) are very effective at distracting people from more fundamental issues. I really think one party needs to drop all that, and focus narrowly on representing the common people as workers.


Link? Are you conflating with "500k Gmail accounts leaked [by a third party]" with Gmail having a breach?

Afaik, Google has had no breaches ever.



Google is the breach.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: