I would bet significant money that, within two years, it will become Generally Obvious that Apple has the best consumer AI story among any tech company.
I can explain more in-depth reasoning, but the most critical point: Apple builds the only platform where developers can construct a single distributable that works on mobile and desktop with standardized, easy access to a local LLM, and a quarter million people buy into this platform every year. The degree to which no one else on the planet is even close to this cannot be understated.
The thing that people seem to have forgotten is that the companies that previously attempted to monetize data center based voice assistants lost massive amounts of money.
> Amazon Alexa is a “colossal failure,” on pace to lose $10 billion this year... “Alexa was getting a billion interactions a week, but most of those conversations were trivial commands to play music or ask about the weather.” Those questions aren’t monetizable.
Google expressed basically identical problems with the Google Assistant business model last month. There’s an inability to monetize the simple voice commands most consumers actually want to make, and all of Google’s attempts to monetize assistants with display ads and company partnerships haven’t worked. With the product sucking up server time and being a big money loser, Google responded just like Amazon by cutting resources to the division.
It doesn't help that Google also keeps breaking everything with the home voice assistants, and this has been true for ages and ages.
I only have a single internet-enabled light in my house (that I got for free), and 90% of the time when I ask the Assistant to turn on the light, it says "Which one?". Then I tell it "the only one that exists in my house", and it says "OK" and turns it on.
Getting it to actually play the right song is on the right set of speakers is also nearly impossible, but I can do it no problem with the UI on my phone.
I don't fear a future where computers can do every task better than us: I fear a future where we have brain-damaged robots annoy the hell out of me because someone was too lazy to do anything besides throw an LLM at things.
> I don't fear a future where computers can do every task better than us: I fear a future where we have brain-damaged robots annoy the hell out of me because someone was too lazy to do anything besides throw an LLM at things.
I had an annoying few weeks where, after years of working properly, Google assistant started misinterpreting "navigate home" as "navigate to the nearest Home Depot™".
QA is the spouses of engineers. Management is a revolving door of the "smartest people" who are thinking about what to eat or their next job. Voices of reason get lost in the noise.
not really limited to their AI products; Android just sometimes randomly decides that pressing play on BT receiver in my car should totally start playing the song directly from my phone instead of the BT it connected to
I feel like you're getting at something different here, but my conclusion is that maybe the problem is the approach of wanting to monetize each interaction.
Almost every company today wants their primary business model to be as a service provider selling you some monthly or yearly subscription when most consumers just want to buy something and have it work. That has always been Apple's model. Sure, they'll sell you services if need be, iCloud, AppleCare, or the various pieces of Apple One, but those all serve as complements to their devices. There's no big push to get Android users to sign up for Apple Music for example.
Apple isn't in the market of collecting your data and selling it. They aren't in the market of pushing you to pick brand X toilet paper over brand Y. They are in the market of selling you devices and so they build AI systems to make the devices they sell more attractive products. It isn't that Apple has some ideologically or technically better approach, they just have a business model that happens to align more with the typical consumers' wants and needs.
> I feel like you're getting at something different here, but my conclusion is that maybe the problem is the approach of wanting to monetize each interaction.
Personally, Google lost me as a search customer (after 25 years) when they opted me into AI search features without my permission.
Not only am I not interested in free tier AI services, but forcing them on me is a good way to lose me as a customer.
The nice thing about Apple Intelligence is that it has an easy to find off switch for customers who don't care for it.
Google is currently going full on Windows 10, for 'selected customers', with Gemini in Android. '(full screen popup) Do you want to try out Gemini? [Now] [Later]' 2 hours later... Do you want to...
> The nice thing about Apple Intelligence is that it has an easy to find off switch for customers who don't care for it.
Not even only that, but the setup wizard literally asks if you'd like it or not. You don't even have to specifically opt-out of it, because it's opt-in.
Yes, there are always ways to deal with companies who make their experience shitty. The point is that you shouldn't have to, and that people will leave for an alternative that doesn't treat them like that.
I feel like this is 5 or so years out of date. The fact that they actually have an Apple Music app for Android is a pretty big push for them. Services is like 25% of their revenue these days, larger than anything except the iPhone.
As I said elsewhere, it really depends on the definition of "service". Subscriptions make up a relatively small minority of that service revenue. For example, 30 seconds of searching suggests that Apple Music's revenue in 2024 was approximately $10b compared to the company as a whole being around $400b. That's not nothing, but it doesn't shape the company in a way that it's competitors are shaped by their service businesses.
The biggest bucket in that "service" category is just Apple's 30% cut of stuff sold on their platform (which it also must be noted, both complements and is reliant on their device sales). That wouldn't really be considered a "service" from either the customer perspective or in the sense of traditional businesses. Operating a storefront digitally isn't a fundamentally different model than operating a brick and mortar store and no one would call Best Buy a "service business".
I know you're saying that Apple's business model is selling devices but it's not like they aren't a services juggernaut.
Where I think you are ultimately correct is that some companies seem to just assume that 100% of interactions can be monetized, and they really can't.
You need to deliver value that matches the money paid or the ad viewed.
I think Apple has generally been decent at recognizing the overall sustainability of certain business models. They've been around long enough to know that most loss-leading businesses never work out. If you can't make a profit from day one what's the point of being in business?
It depends. I guess you can argue this is true purely from scale. However, we should also keep in mind there are a lot of different things that Apple and tech companies in general put under "services". So even when you see a big number under "Service Revenue" on some financial report, we should recognize that most of that was from taking a cut of some other transaction happening on their devices. Relative to the rest of their business, they don't make much from monthly/yearly subscriptions or monetizing their customers' searches/interactions. They instead serve as a middleman on purchase of apps, music, movies, TV, and now even financial transactions made with Apple Card/Pay/Cash. And in that way, they are a service company in the same way that any brick and mortar store is a service company.
I'm confused at what you're trying to say here. Why exactly doesn't the service revenue matter again? For some pedantic reason of Apple being metaphorically similar to a brick and mortar store?
Apple's services revenue is larger than Macs and iPads combined, with a 75% profit margin, compared to under 40% for products (hardware).
Yeah, they serve as a middleman...an incredibly dominant middleman in a duopoly. 80% of teenagers in the US say they have an iPhone. Guess what, all that 15-30% app store revenue is going to Apple. That's pretty much the definition of a service juggernaut.
I also don't agree with you about the lack of selling Apple services to non-Apple users. TV+ is a top-tier streaming service with huge subscriber numbers, and their app is on every crappy off-brand smart TV and streaming stick out there. Yes, there really are Android users who subscribe to Apple Music - 100 million+ downloads on the Google Play store, #4 top grossing app in the music category.
It's really interesting to consider an area where they are being successful with their AI, the notification summaries work pretty well! It's an easy sell to the consumer bombarded with information/notifications all over the place that on-device processing can filter this and cut out clutter. Basically, don't be annoying. I think a lot of people don't really know how well things like their on-device image search works (it'll OCR an upside-down receipt sitting on a table successfully), I never see them market that strength ever judging by the number of people with iphones that are surprised when I show them this on their own phones.
HOWEVER, you would never know this though given the Apple Store experience! As I was dealing with the board swap in my phone last month, they would have these very loud/annoying 'presentations' every like half hour or so going over all the other apple intelligence features. Nobody watched, nobody in the store wanted to see this. In fact when you consider the history of how the stores have operated for years, the idea was to let customers play around with the device and figure shit out on their own. Store employee asks if they need anything explained but otherwise it's a 'discovery' thing, not this dictated dystopia.
The majority of people I heard around me in the store were bringing existing iphones in to get support with their devices because they either broke them or had issues logging into accounts (lost/compromised passwords or issues with passkeys). They do not want to be told every constantly about the same slop every other company is trying to feed them.
Some features are not meant to be revenue sources. I'd lump assistive technology and AI assistants into the category of things that elevate the usefulness of one's ecosystem, even when not directly monetizable.
Edit: IMO Apple is under-investing in Siri for that role.
The assistant thing really shows the lie behind most of the "big data" economy.
1) They thought an assistant would be able to operate as an "agent" (heh) that would make purchasing decisions to benefit the company. You'd say "Alexa, buy toilet paper" and it would buy it from Amazon. Except it turns out people don't want their computer buying things for them.
2) They thought that an assistant listening to everything would make for better targeted ads. But this doesn't seem to be the case, or the increased targeting doesn't result in enough value to justify the expense. A customer with the agent doesn't seem to be particularly more valuable than one without.
I think that this AI stuff and LLMs in particular is an excuse, to some extent, to justify the massive investment already made in big data architecture. At least they can say we needed all this data to train an LLM! I've noticed a similar pivot towards military/policing: if this data isn't sufficiently valuable for advertising maybe it's valuable to the police state.
> Except it turns out people don't want their computer buying things for them.
I think this also hits an interesting problem with confidence: if you could trust the service to buy what you’d buy and get a good price you’d probably use it more but it only saves a couple of seconds in the easy case (e.g. Amazon reorders are already easy) and for anything less clear cut people rightly worry about getting a mistake or rip-off. That puts the bar really high because a voice interface sucks for more complex product comparisons and they have a very short window to give a high-quality response before most people give up and use their phone/computer instead. That also constrains the most obvious revenue sources because any kind of pay for placement is going to inspire strong negative reactions.
> Those questions aren’t monetizable. ... There’s an inability to monetize the simple voice commands most consumers actually want to make.
There lies the problem. Worse, someone may solve it in the wrong way:
I'll turn on the light in a minute, but first, a word from our sponsor...
Technically, this will eventually be solved by some hierarchical system. The main problem is developing systems with enough "I don't know" capability to decide when to pass a question to a bigger system. LLMs still aren't good at that, and the ones that are require substantial resources.
What the world needs is a good $5 LLM that knows when to ask for help.
This type of response has been given by Alexa from an echo device in my house. I asked, “play x on y”, the response was something like “ok, but first check out this new…”. I immediately unplugged that device and all other Alexa enabled devices in the house. We have not used it since.
This is the monetization wall they have to figure out how to break through. The first inkling of advertising is immediate turn off and destroy, for me.
Even worse than ads, mine keeps trying to jam "News" down my throat. I keep disabling the news feeds on all my devices and they kept re-enabling against my wishes. Every now and then I'll say something to Alexa and she'll just start informing me about how awful everything is, or the echo show in the kitchen will stop displaying the weather in favor of some horrific news story.
Me: "Alexa, is cheese safe for dogs?"
Alexa: "Today, prominent politician Nosferatu was accused by the opposition of baby-cannibal sex trafficking. Nosferatu says that these charges are baseless as global warming will certainly kill everyone in painful ways by next Tuesday at exactly 3pm. In further news, Amazon has added more advertisements to this device for only a small additional charge..."
If I wanted to feel like crap every time I go to the kitchen I'd put a scale in there. /s
I find this a really interesting observation. I feel like 3-4 trivial ways of doing it come to mind, which is sort of my signal that I’m way out of my depth (and that anything I’ve thought of is dumb or wrong for various reasons). Is there anything you’d recommend reading to better understand why this is true?
You are asking why someone don't want to ship a tool that obviously doesn't work? Surely it's always better/more profitable to ship a tool that at least seems to work
GP means they aren't good at knowing when they are wrong and should spend more compute on the problem.
I would say the current generation of LLMs that "think harder" when you tell them their first response is wrong is a training grounds for knowing to think harder without being told, but I don't know the obstacles.
Are you suggesting that when you tell it "think harder" it does something like "pass a question to a bigger system"? I have doubts... It would be gated behind more expensive plan if so
Because people make them and people make them for profit. incentives make the product what it is.
an LLM just needs to return something that is good enough for average person confidently to make money. if an LLM said "I don't know" more often it would make less money. because for the user this is means the thing they pay for failed at its job.
The difference is previous version of alexa wasn't good enough to pay for it. Now it is good enough that millions of users are paying $10-100 for these services.
Voice assistants that were at the level of a fairly mediocre internet-connected human assistant might be vaguely useful. But they're not. So even if many of us have one or two in our houses or sometimes lean on them for navigation in our cars we mostly don't use them much.
Amazon at one point was going to have a big facility in Boston as I recall focused on Alexa. It's just an uninteresting product that, if it were to go away tomorrow I wouldn't much notice. And I certainly wouldn't pay an incremental subscription for.
This is the part that hasn't made much sense to me. Maybe just.. have a better product?
As you quoted above, "most of those conversations were trivial commands to play music or ask about the weather." Why does any of this need to consume provider resources? Could a weather or music command not just be.. a direct API call from the device to a weather service / Spotify / whatever? Why does everything need to be shipped to Google/Amazon HQ?
I had a group of students make a service like this in 2019, completely local, could work offline, did pretty much everything Alexa can do, and they made it connect to their student accounts so they could ask it information about their class schedules. If they can do it, Amazon certainly can. That they don't says they think they can extract more value from monitoring each and every request than they could from selling a better product.
From what I can tell, only Apple even wants to try doing any of the processing on-device. Including parsing the speech. (This may be out-of-date at this point, but I haven't heard of Amazon or Google doing on-device processing for Alexa or Assistant.)
So there's no way for them to do anything without sending it off to the datacenter.
Alexa actually had the option to process all requests locally (on at least some hardware) for the first ~10 years, from launch until earlier this year. The stated reason for removing the feature was generative AI.
I think of my Alexa often when I think about AI and how Amazon, of all people, couldn't monetize it. What hope do LLM providers have? Alexa is in rooms all around my house and has gotten amazing at answering questions, setting timers, telling me the weather, etc., but would I ever pay a subscription for it? Absolutely not. I wouldn't even have bought the hardware except that it was a loss leader and was like $20. I wouldn't have even paid $100 for it. Our whole economy is mortgaged on this?
I'm extremely bearish on AI, but I'm not sure I agree with the framing "not even Amazon could..." All of the advertising around Alexa focused on the simple narrow use cases that people now use it for, and I'm inclined to assume that advertising is part of it. I think another part is probably that voice is really just not that fantastic of an interface for any other kind of interactions. I don't find it surprising that OpenAI's whole framing around ChatGPT, of it being a text-based chat window (as are the other LLMs), is where most of the use seems to happen. I like it best when Alexa acts as a terse butler ("turn on the lights" "done"), not a chatty engaging conversationalist.
This is probably why there’s so much attention on LLM powered coding tools, as it’s one of the few use cases that seem like people would actually pay for it. Ironically mostly developers, who are being marketed as being replaced by AI.
It's also a use case where you already have a user of above-average intelligence who is there correcting hallucinations and mistakes, and is mostly using the technology to speed up boilerplate.
This just doesn't translate to other job types super well, at least, so far.
As a sibling poster has said, I don't know how much on-device AI is going to matter.
I have pretty strong views on privacy, and I've generally thrown them all out in light of using AIs, because the value I get out of them is just so huge.
If Apple actually had executed on their strategy (of running models in privacy-friendly sandboxes) I feel they would've hit it out of the park. But as it stands, these are all bleeding edge technologies and you have to have your best and brightest on them. And even with seemingly infinite money, Apple doesn't seem to have delivered yet.
I hope the "yet" is important here. But judging by the various executives leaving (especially rumors of Johnny Srouji leaving), that's a huge red flag that their problem is that they're bleeding talent, and not a lack of money.
On-device moves all compute cost (incl. electricity) to the consumer. I.e., as of 2025 that means much less battery life, a much warmer device, and much higher electricity costs. Unless the M-series can do substantially more with less this is a dead end.
For the occasional local LLM query, running locally probably won't make much of a dent in the battery life, smaller models like mistral-7b can run at 258 tokens/s on an iPhone 17[0].
The reason why local LLMs are unlikely to displace cloud LLMs is memory footprint, and search.
The most capable models require hundreds of GB of memory, impractical for consumer devices.
I run Qwen 3 2507 locally using llama-cpp, it's not a bad model, but I still use cloud models more, mainly due to them having good search RAG.
There are local tools for this, but they don't work as well, this might continue to improve, but I don't think it's going to get better than the API integrations with google/bing that cloud models use.
Battery isn't relevant to plugged-in devices, and in the end, electricity costs roughly the same to generate and deliver to a data center as to a home. The real cost advantage that cloud has is better amortization of hardware since you can run powerful hardware at 100% 24/7 spread across multiple people. I wouldn't bet on that continuing indefinitely, consumer hardware tends to catch up to HPC-exclusive workloads eventually.
You could have an AppleTV with 48 GB VRAM backing the local requests, but... the trend is "real computers" disappearing from homes, replaced by tablets and phones. The advantage the cloud has is Real Compute Power for the few seconds you need to process the interaction. That's not coming home any time soon.
Apple runs all the heavy compute stuff overnight when your device is plugged in. The cost of the electricity is effectively nothing. And there is no impact on your battery life or device performance.
You don't have to abandon privacy when using an eye - use a service that accesses enterprise APIs, which have good privacy policies. I use the service from the guys who create the This day in AI podcast called smithery.ai -we are access to all of the sota models so we can flip between any model including lots of open source ones within one chat or within multiple chats and compared the same query, using various MCPs and lots of other features. If you're interested have a look at the discord to simtheory.ai (I have no connection to the service or to the creators)
I’m much more optimistic on device-side matmul. There’s just so much of it in aggregate and the marginal cost is so low especially since you need to drive fancy graphics to the screen anyway.
Somebody will figure out how to use it—complementing Cloud-side matmul, of course—and Apple will be one of the biggest suppliers.
I don't think the throughput of a general purpose device will make a competitive offering; so being local is a joke. All the fun stuff is running on servers at the moment.
From there, AI integration is enough of a different paradigm that the existing apple ecosystem is not a meaningful advantage.
Best case Apple is among the fast copies of whoever is actually innovative, but I don't see anything interesting coming from apple or apple devs anytime soon.
People said the same things about mobile gaming [1] and mainframes. Technology keeps pushing forward. Neural coprocessors will get more efficient. Small LLMs will get smarter. New use-cases will emerge that don't need 160IQ super-intellects (most use-cases even today do not)
The problem for other companies is not necessarily that data center-borne GPUs aren't technically better; its that the financials might never make sense, much like how the financials behind Stadia never did, or at least need Google-levels of scale to bring in advertising and ultra-enterprise revenue.
> All the fun stuff is running on servers at the moment.
With "Apple Intelligence" it looks like Apple is setting themselves up (again) to be the gatekeeper for these kind of services, "allow" their users to participate and earn a revenue share for this, all while collecting data on what types of tasks are actually in high-demand, ready to in-source something whenever it makes economic sense for them...
Outside of fun stuff there is potential to just make chat another UI technology that is coupled with a specific API. Surely smaller models could do that, particularly as improvements happen. If that was good enough what would be the benefit of an app developer using an extra API? Particularly if Apple can offer an experience that can be familiar across apps.
Also why would you want it sucking your battery or heating your room when a data center is only 20 milliseconds away and it's nothing more than a few kilobytes of text. It makes no sense for the large majority of users' preferences which downweight privacy and the ability to tinker.
An LLM on your phone can know everything else that is on your phone. Even Signal chat plaintexts are visible on the phone itself.
People definitely will care that such private data stays safely on the phone. But it’s kind of a moot point since there is no way to share that kind of data with ChatGPT anyway.
I think Apple is not trying to compete with the big central “answer machine” LLMs like Google or ChatGPT. Apple is aiming at something more personal. Their AI goal may not be to know everything, but rather to know you better than any other piece of tech in the world.
And monetization is easy: just keep selling devices that are more capable than the last one.
Gemini can know everything in my Google account, which is basically synonymous with everything that's on my phone, except for text messages. And I use an iPhone. And then Gemini will work just as well on the web when I use my laptop.
So I don't see what unique advantage this gives Apple. These days people's data lives mostly in the cloud. What's on their phone is just a local cache.
I'd loved to see a strong on-device multi-modal Siri + flexibility with shortcuts.
Besides the "best consumer AI story" they could additionally create a strong offering to SMBs with FileMaker + strong foundation models support baked in. Actually rooting for both!
I said "Consumer AI". Even Apple is likely beating Google in consumer AI DAUs, today. Google has the Pixel and gemini.google.com, and that's it; practically zero strategy.
Local AI sounds nice but most of Apple’s PCs and other devices don’t come with enough RAM for a decent price needed for good model performance and macOS itself is incredibly bloated.
Depends what you are actually doing. It's not enough to run a chatbot that can answer complex questions. But it's more than enough to index your data for easy searching, to prioritise notifications and hide spam ones, to create home automations from natural language, etc.
Apple has the ability and hardware to deeply integrate this stuff behind the scenes without buying in to the hype of a shiny glowing button that promises to do literally everything.
That might work well for Apple to be the consumer electronic manufacturer that people use to connect to OpenAI/Anthropic/Google for their powerful creative work.
i'd have a lot more respect for apple's "cautious" approach to AI if they didn't keep promising and then failing to deliver siri upgrades (while still calling out to cloud backends, despite all the talk about local LLM), or if they hadn't shipped the absolute trash that is notification summaries.
i think at this point it's pretty clear that their AI products aren't bad because it's some clever strategy, it's bad because they're bad at it. I agree that their platform puts them in a good place to provide a local LLM experience to developers, but i remain skeptical that they will be able to execute on it.
I don't know, I feel like Apple shot themselves in the foot selling 8GB consumer laptops up until around 2024 while packing them with advanced AI inference, and usually had lower RAM on their mobile and ipads.
On the other hand all devs having to optimize for lower RAM will help with freeing it up for AI on newer devices with more.
This is a way of attributing where the comment is coming from, which is better than responding with what the AI says and not attributing it. I would support a guideline that discourages posting the output from AI systems, but ultimately there's no way to stop it.
IMO: Cook is going to announce his retirement by the end of Q1, they've already selected a CEO (probably Ternus), the incoming CEO wants leadership change, and some of these departures are because its better that this purge happens before the CEO change than after. I think this explains Giannandrea, Williams, and Jackson.
Dye may have also been involved in that, given how unpopular he was internally at Apple. But more likely just personal / Meta offered him a billion dollars. Maestri leaving was also probably totally uninvolved.
Srouji is the weirdest case, and I'm hesitant to believe its even true just given its a rumor at this point. Its possible he was angry about being passed over for CEO, but realistically, it was always going to be Ternus, Williams, or Federighi. If Ternus is the next CEO, its likely we'll see Apple combine the Hardware Technologies and Hardware Engineering divisions, then have Srouji lead both of them. I really do not see him leaving the company.
The other less probable theory is that they actually picked Fadell, and this deeply pissed off many people in Apple's senior leadership. So, what we're seeing is more chaos than it first seems.
Generally, as long as Srouji doesn't leave, these changes feel positive for Apple, and especially if there's a CEO change in early 2026: This is what "the fifth generation of Apple Inc" looks like. I don't understand the mindset of people who complain about Apple's products and behavior over the past decade, then don't receive this news as directionally positive.
Cook is denying that he has any current plans to step down. There was also a Bloomberg article about this a couple of days ago.
What they point out is that a lot of Apple's senior leadership are of a similar age and are simply approaching retirement now. But they are also losing younger rising stars they desperately need to fill the ensuing void. At the moment, they are simply losing talent left and right, and that is unsustainable if they want to maintain their competitive edge and avoid completely turning into Microsoft.
The more likely explanation is that a certain amount of internal rot has set in. They haven't really launched a successful major new product category in years, and a lot of their initiatives have either stalled or failed. Something is clearly not right, and top tier talent doesn't will only tolerate that sort of thing for so long before moving on.
> They haven't really launched a successful major new product category in years
I agree this is true, but Apple’s always done their best work when they’re the second mover. Smartphones, iPods, earbuds, good desktop PCs were all after they watched what was good and then made it better (if you like what they did, anyway).
The next hardware category is probably AR glasses if someone can make them good and cheap, nobody has so Apple won’t do anything but wait. I’m sure they have an optics lab working on something, but probably not full throttle (and the Vision Pro is an attempt to make the OS).
> Apple’s always done their best work when they’re the second mover.
People say Apple does its best work as a “second mover,” but that misses the actual pattern: Apple builds great products when leadership is solving their own problems.
The Mac, iPod, iPhone, and iPad weren’t just refinements of existing products. They were devices Steve Jobs personally wanted to use and couldn’t find elsewhere. The man saw the GUI at Xerox and saw how anyone could use a computer without remembering arcane commands. So he drove the development of the Mac. He was using a shitty mobile phone, saw the opportunity and had the iPhone developed. Same with the early Apple Watch (first post-Jobs new product line), which reflected Jony Ive’s fashion ambitions; once he left, it evolved into what current leadership actually uses: a high-end fitness tracker.
The stagnation we're seeing now isn’t about Apple losing its “second-mover magic.” It’s that leadership doesn’t feel an unmet need that demands a new device. None of Vision Pro, Siri, Apple Intelligence or even macOS itself anymore appear to be products the execs themselves rely on deeply, and it shows. Apple excels when it scratches its own itch and right now, it doesn’t seem to have one.
I think this is an interesting take that really reflects the saturation of the wider problem space of society. Much of the stuff that we could potentially need, we already have. It will be interesting to see what new products are released to the market in the next ten or so years which substantially change the way that we use technology.
> They haven't really launched a successful major new product category in years
How frequently do you expect a new major product category across the industry? Is there any company who launched one that wasn't ChatGPT in the same time frame?
Apple used to put out new or interesting products. E.g. they just up and released Time Machine routers when no one was really doing that in the home router industry like at all, maybe a clunky usb ftp solution but this was first party apple white glove treatment of the issue. They had great software too in many different niches e.g. Aperture coming after Adobe's pro photo pie.
It was amazing how much diversity in really well thought out hardware as well as software was happening at apple years ago, when it was a far smaller company in terms of manpower and resources than it is today. I guess when the business model is selling ongoing subscriptions instead of compelling new products in order to get money, you stop getting the compelling new products coming out.
Personally I wouldn't count Chromebooks as something newer than Apple's last category-creating product since the iPad is in roughly the same time frame and netbooks a few years before that.
The Apple Watch is newer and is where I'd say the cutoff is for Apple.
--
At a higher level, I'd say there were two personal-computer-hardware revolution periods that Apple featured heavily in:
1) home personal computers and then the GUI-fication of them and the portable-ification - the wave the Apple II was part of, and then the one the Mac mainstreamed, then laptops where Apple was pretty instrumental in setting design and execution standards
2) mainstream general-purpose/software-defined mobile devices (vs single- or few-function gadgets). Initial failures or niche products (Newton from Apple, Palm/PocketPC more successfully as a niche later) and then Apple REALLY mainstreaming with the iPhone and the extensions that were the iPad and Watch. I'm leaving out the iPod here since "single-purpose MP3 players" were a transitional stop on the gadget->general purpose device trend. (But that general purpose nature also makes it hard to invent a new mobile device category.)
Of things that have been percolating for a while, maybe VR/AR takes off one day, I'm not sure there's mass appeal there. Are people going to get enough utility over a phone to justify pop-up ads in their field-of-view all day long?
It's possible the LLM/transformer boom could lead to some new categories, but we don't know what that would look like yet, so it's hard to penalize Apple for not being a super-early first-mover in the last 3 years since nobody else has figured out a great hardware story there either, and even in their prime they were less of a "first mover" than a "show everyone else how it could be done better" player.
I guess we’re being a bit vague on timeframe but chrome books launched in 2011 so they’re one of those products that took ~10 years to be an overnight success, with 2020 being an accelerant. So my vote is no.
This seems true at many companies. While I'm not all that impressed by many current leaders, I'm sort of terrified of my generation (younger gen x) taking over because some of them seem to not be prepared or not have been prepared for the roles.
Look outside the HackerNews/Silicon Valley bubble: Apple is doing very well. Consumers broadly don't care whether their phone has AI, as long as it has the ChatGPT/etc apps. iMessage and FaceTime have a stranglehold on, uh, everyone in America. They sell more iPhones every quarter. Their services revenue keeps going up. Mac sales are up big. Apple Silicon is so far ahead of anything else on the market, they could stay on the M5 platform for three years and still be #1. Apple Watch is the most popular watch brand in the world (and its not close; sensing a pattern?). Airpods, alone, make more money than Texas Instruments or SuperMicro. Yes; Vision Pro and iPhone Air sold poorly. Who cares? They're both obvious stepping stones to products that will sell well (Vision Pro -> Glasses-style AR device, iPhone Air -> thin engineering will help with the iPhone Fold). Apple can afford to take risks and adjust.
Sure, there can be cultural things going on. But at the senior leadership level, the degree to which those would have to be bad, in the absence of major revenue problems, to cause this reaction is... unheard of.
I wish Apple would split software and bring back Scott Forstall (soon after Tim Cook leaves) and give him a big chunk, like iOS and iPadOS. Craig Federighi needs to reduce the scope of his work and get competent people in to handle the software area.
Great points, this is indicative of something going on. And this point is especially spot on:
> I don't understand the mindset of people who complain about Apple's products and behavior over the past decade, then don't receive this news as directionally positive.
It's time for change. Maybe it won't get better, but I do hope it will.
True. Cook was a great money maker but he's so boring on the product side.
But they'll never get anyone even close to Jobs obviously. Just won't happen. Even if they find someone with the same attention to detail and "risk it all on a grand vision" mentality, he or she won't get the trust of the board who are generally risk-averse. The only reason Jobs got away with doing all that was that he was Mr. Apple. He was the company.
Hopefully they'll get someone closer to that but the magic will never come back IMO.
> the incoming CEO wants leadership change, and some of these departures are because its better that this purge happens before the CEO change than after
Or the more common all the ones who didn’t get the crown are leaving.
> The other less probable theory is that they actually picked Fadell
"Less probable" is the understatement of the century. This rumor came out of nowhere, and it should instantly set off the BS-meter of anyone familiar with how Apple is run.
The most likely explanation for it is that Tony felt like a little boost to his profile couldn't hurt whatever his next step might be, and so he made a few phone calls to get this rumor ball rolling so that his name is in the news for a bit (hey, it worked!).
Your last point is the interesting one: for years people have complained that Apple has gotten slow, conservative, and repetitive. Maybe, this is what the reset looks like
> I don't understand the mindset of people who complain about Apple's products and behavior over the past decade, then don't receive this news as directionally positive.
Short of Tim Cook being replaced, it just seems like disarray and things are falling apart at the seams, resulting in things only likely getting worse, not better.
If Tim Cook is indeed about to get replaced, then I think you might hear fewer complaints. But right now, the complaints are likely assuming a Tim Cook replacement isn’t part of the plan, or at the very least, not a guarantee.
If you’re wrong about a Tim Cook replacement, then I think the complaints may be justified.
Could be Fadell, why else would Apple put Thread in phones? Maybe iTunes Store (via Fuse) and iPhone (via General Magic) weren't the only things Fadell had pitched Jobs on when the time was right.
Can't exactly write Java 25 without updating your legacy application, can you? And it tends to be the oldest applications that are the hardest to update for some painful reason. Would be nice if we all could live on the bleeding edge all the time
This is hn, where unless something is written in rust or zig usually, people will hate on it. They would rather pump a cli tool than any software of sizable scale.
I love Rust. I love Haskell even more. I’m a big fan of Scheme. But anyone who “hates” on Java is just revealing a lack of understanding of what’s important in professional software development.
This is such a crappy point. People say it's better now but even in java 8 it's just BS. Oh boo hoo I have to write a few extra words here and there. Woe is me. The IDE will autogenerate the boilerplate for you, you don't even have to write it yourself. And once it's there it's actually useful, there's a reason it exists.
Seriously. I don’t get all the over concern over the verbosity. At least in java you can tell what the hell is going on. And the tools…so good. Right now I am in python typescript world. And let me tell you, the productivity and ease of java and c# are sorely missed!
One of the things I dislike about the Youtube app on Apple TV is how it appears to maintain an entirely separate list of recommended videos, specific to the kinds of videos I tend to watch on TV, versus the phone and desktop (which might themselves also each have their own recommendation algorithm, but my behavior there is closer so as to not notice).
The difference is stark. I use YouTube on the Apple TV to play mostly background videos; 8 hour AI generated lofi mixes, burning fireplaces, things like that. Ambiance. Its all that gets recommended now when I pull up the app; but only on the TV.
This behavior is somewhat desirable: but the issue is, the youtube apple TV app is an abhorrent experience that feels deeply tailored to stop you from getting to any content that is not expressly recommended. And these videos are all that get recommended. A new Linus Tech Tips video might be in my feed on desktop/mobile; but finding that video on the TV literally requires me to search "Linus Tech Tips" and go to their channel -> all videos.
I certainly don't mind the platform raising the prominence of videos I tend to watch on that platform; but to me it feels like I should be able to at least scroll down on the home page a bit to get a more "centralized" view into everything my account watches and would be recommended.
Yeah, I wish the UI could let me browse the various silos of videos I like to watch instead of trying to be clever with one feed.
And it’s like Youtube thinks I only want to watch the last three genres at any moment. If I branch out, then it pops another favorite genre from the set.
Sometimes I’ll go months or even years forgetting about video genres I love until I randomly remember it.
Feels like a wasted opportunity, and it should have more in common with music apps.
There isn't necessarily rationality behind venture deals; its just a numbers game combined with the rising tide of the sector. These firms are not Berkshire. If the tide stops rising, some of the companies they invested in might actually be ok, but the venture boat sinks; the math of throwing millions at everyone hoping for one to 200x on exit does not work if the rising tide stops.
They'll say things like "we invest in people", which is true to some degree, being able to read people is roughly the only skill VCs actually need. You could probably put Sam Altman in any company on the planet and he'd grow the crap out of that company. But A16z would not give him ten billion to go grow Pepsi. This is the revealed preference intrinsic to venture; they'll say its about the people, but their choices are utterly predominated by the sector, because the sector is the predominate driver of the multiples.
"Not investing" is not an option for capital firms. Their limited partners gave them money and expect super-market returns. To those ends, there is no rationality to be found; there's just doing the best you can of a bad market. AI infrastructure investments have represented like half of all US GDP growth this year.
Slightly related but unpopular opinion I have: I think software, broadly, today is the highest quality its ever been. People love to hate on some specific issues concerning how the Windows file explorer takes 900ms to open instead of 150ms, or how sometimes an iOS 26 liquid glass animation is a bit janky... we're complaining about so much minutia instead of seeing the whole forest.
I trust my phone to work so much that it is now the single, non-redundant source for keys to my apartment, keys to my car, and payment method. Phones could only even hope to do all of these things as of like ~4 years ago, and only as of ~this year do I feel confident enough to not even carry redundancies. My phone has never breached that trust so critically that I feel I need to.
Of course, this article talks about new software projects. And I think the truth and reason of the matter lies in this asymmetry: Android/iOS are not new. Giving an engineering team agency and a well-defined mandate that spans a long period of time oftentimes produces fantastic software. If that mandate often changes; or if it is unclear in the first place; or if there are middlemen stakeholders involved; you run the risk of things turning sideways. The failure of large software systems is, rarely, an engineering problem.
But, of course, it sometimes is. It took us ~30-40 years of abstraction/foundation building to get to the pretty darn good software we have today. It'll take another 30-40 years to add one or two more nines of reliability. And that's ok; I think we're trending in the right direction, and we're learning. Unless we start getting AI involved; then it might take 50-60 years :)
I've played around with Gemini 3 Pro in Cursor, and honestly: I find it to be significantly worse than Sonnet 4.5. I've also had some problems that only Claude Code has been able to really solve; Sonnet 4.5 in there consistently performs better than Sonnet 4.5 anywhere else.
I think Anthropic is making the right decisions with their models. Given that software engineering is probably one of the very few domains of AI usage that is driving real, serious revenue: I have far better feelings about Anthropic going into 2026 than any other foundation model. Excited to put Opus 4.5 through its paces.
> only Claude Code has been able to really solve; Sonnet 4.5 in there consistently performs better than Sonnet 4.5 anywhere else.
I think part of it is this[0] and I expect it will become more of a problem.
Claude models have built-in tools (e.g. `str_replace_editor`) which they've been trained to use. These tools don't exist in Cursor, but claude really wants to use them.
Maybe they want to have their own protocol and standard for file editing for training and fine-tuning their own models, instead of relying on Anthropic standard.
Or it could be a sunk cost associated with Cursor already having terabytes of training data with old edit tool.
Maybe this is a flippant response, but I guess they are more of a UI company and want to avoid competing with the frontier model companies?
They also can’t get at the models directly enough, so anything they layer in would seem guaranteed to underperform and/or consume context instead of potentially relieving that pressure.
Any LLM-adjacent infrastructure they invest in risks being obviated before they can get users to notice/use it.
TIL! I'll finally give Claude Code a try. I've been using Cursor since it launched and never tried anything else. The terminal UI didn't appeal to me, but knowing it has better performance, I'll check it out.
Cursor has been a terrible experience lately, regardless of the model. Sometimes for the same task, I need to try with Sonnet 4.5, ChatGPT 5.1 Codex, Gemini Pro 3... and most times, none managed to do the work, and I end up doing it myself.
Glad you mentioned "Cursor has been a terrible experience lately", as I was planning to finally give it a try. I'd heard it has the best auto-complete, which I don't get use VSCode with Claude Code in the terminal.
+1, it had a bad period when they were hyperscaling up, but IME they've found their pace (very) recently - I almost ditched cursor in the summer, but am a quite happy user now.
I haven’t used Cursor since I use Neovim and it’s hard to move out.
The auto-complete suggestions from FIM models (either open source or even something Gemini Flash) punch far above their weight. That combined with CC/Codex has been a good setup for me.
I was evaluating codex vs claude code the past month and GPT 5.1 codex being slow is just the default experience I had with it.
The answers were mostly on par (though different in style which took some getting used to) but the speed was a big downer for me. I really wanted to give it an honest try but went back to Claude Code within two weeks.
I've actually been working on porting the tab completion from Cursor to Zed, and eventually IntelliJ, for fun
It shows exactly why their tab completion is so much better than everyone else's though: it's practically a state machine that's getting updated with diffs on every change and every file you're working with.
(also a bit of a privacy nightmare if you care about that though)
it's not about the terminal, but about decoupling yourself from looking at the code. The Claude app lets you interact with a github repo from your phone.
these agents are not up to the task of writing production level code at any meaningful scale
looking forward to high paying gigs to go in and clean up after people take them too far and the hype cycle fades
---
I recommend the opposite, work on custom agents so you have a better understanding of how these things work and fail. Get deep in the code to understand how context and values flow and get presented within the system.
> these agents are not up to the task of writing production level code at any meaningful scale
This is obviously not true, starting with the AI companies themselves.
It's like the old saying "half of all advertising doesn't work; we just don't which half that is." Some organizations are having great results, while some are not. From the multiple dev podcasts I've listened to by AI skeptics have had a lightbulb moment where they get AI is where everything is headed.
Not a skeptic, I use AI for coding daily and am working on a custom agent setup because, through my experience for more than a year, they are not up to hard tasks.
This is well known I thought, as even the people who build the AIs we use talk about this and acknowledge their limitations.
I'm pretty sure at this point more than half of Anthropic's new production code is LLM-written. That seems incompatible with "these agents are not up to the task of writing production level code at any meaningful scale".
how are you pretty sure? What are you basing that on?
If true, could this explain why Anthropics APIs are less reliable than Gemini's? (I've never gotten a service overloaded response from Google like I did from Anthropic)
My current understanding (based on this text and other sources) is:
- There exist some teams at Anthropic where around 90% of lines of code that get merged are written by AI, but this is a minority of teams.
- The average over all of Anthropic for lines of merged code written by AI is much less than 90%, more like 50%.
> I've never gotten a service overloaded response from Google like I did from Anthropic
They're Google, they out-scale everyone. They run more than 1.3 quadrillion tokens per month through LLMs!
You cannot clean up the code, it is too verbose. That said, you can produce production ready code with AI, you just need to put up very strong boundaries and not let it get too creative.
Also, the quality of production ready code is often highly exaggerated.
Has a section for code. You link it to your GitHub, and it will generate code for you when you get on the bus so there's stuff for you to review after you get to the office.
The app version is iPhone only, you don’t get Code in the Android app, you have to use a web browser.
I use it every day. I’ll write the spec in conversation with the chatbot, refining ideas, saying “is it possible to …?” Get it to create detailed planning and spec documents (and a summary document about the documents). Upload them to Github and then tell Code to make the project.
I have never written any Rust, am not an evangelist, but Code says it finds the error messages super helpful so I get it to one shot projects in that.
I do all this in the evenings while watching TV with my gf.
It amuses me we have people even this thread claiming what it already does is something it can’t do - write working code that does what is supposed to.
I get to spend my time thinking of what to create instead of the minutiae of “ok, I just need 100 more methods, keep going”. And I’ve been coding since the 1980 so don’t think I’m just here for the vibes.
My workflow was usually to use Gemini 2.5 Pro (now 3.0) for high-level architecture and design. Then I would take the finished "spec" and have Sonnet 4.5 perform the actual implementation.
Same here. Gemini really excels at all the "softer" parts of the development process (which, TBH, feels like most of the work). And Claude kicks ass at the actual code authoring.
Yeah, I’ve used vatiations of the “get frontier models to cross-check and refine each others work” pattern for years now and it really is the path to the best outcomes in situations where you would otherwise hit a wall or miss important details.
It’s my approach in legal as well. Claude formulates its draft, then it prompts codex and gemini for theirs. Claude then makes recommendations for edits to its draft based on others. Gemini’s plan is almost always the worst, but even it frequently has at least one good point to make.
If you're not already doing that you can wire up a subagent that invokes codex in non interactive mode. Very handy, I run Gemini-cli and codex subagents in parallel to validate plans or implementations.
I was doing this but I got worried I will lose touch with my critical thinking (or really just thinking for that matter). As it was too easy to just copy paste and delegate the thinking to The Oracle.
This is how I do it. Though, I've been using Composer as my main driver more an more.
* Composer - Line-by-Line changes
* Sonnet 4.5 - Task planning and small-to-medium feature architecture. Pass it off to Composer for code
* Gemini Pro - Large and XL architecture work. Pass it off to Sonnet to breakdown into tasks.
I like this plan, too - gemini's recent series have long seemed to have the best large context awareness vs competing frontier models - anecdotally, although much slower, I think gpt-5's architecture plans are slightly better.
I really don’t understand the hype around Gemini. Opus/Sonnet/GPT are much better for agentic workflows. Seems people get hyped for the first few days. It also has a lot to do with Claude code and Codex.
Gemini is a lot more bang for the buck. It's not just cheaper per token, but with the subscription, you also get e.g. a lot more Deep Research calls (IIRC it's something like 20 per day) compared to Anthropic offerings.
Also, Gemini has that huge context window, which depending on the task can be a big boon.
I'm completely the opposite. I find Gemini (even 2.5 Pro) much, much better than anything else. But I hate agentic flows, I upload the full context to it in aistudio and then it shines - anything agentic cannot even come close.
I recently wrote a small CLI tool for scanning through legacy codebases. For each file, it does a light parse step to find every external identifier (function call, etc...), reads those into the context, and then asks questions about the main file in question.
It's amazing for trawling through hundreds of thousands of lines of code looking for a complex pattern, a bug, bad style, or whatever that regex could never hope to find.
For example, I recently went through tens of megabytes(!) of stored procedures looking for transaction patterns that would be incompatible with read committed snapshot isolation.
I got an astonishing report out of Gemini Pro 3, it was absolutely spot on. Most other models barfed on this request, they got confused or started complaining about future maintainability issues, stylistic problems or whatever, no matter how carefully I prompted them to focus on the task at hand. (Gemini Pro 2.5 did okay too, but it missed a few issues and had a lot of false positives.)
Fixing RCSI incompatibilities in a large codebase used to be a Herculean task, effectively a no-go for most of my customers, now... eminently possible in a month or less, at the cost of maybe $1K in tokens.
If this is a common task for you, I'd suggest instead using an LLM to translate your search query into CodeQL[1], which is designed to scan for semantic patterns in a codebase.
+1 - Gemini is consistently great at SQL in my experience. I find GPT 5 is about as good as gemini 2.5 pro (please treat is as praise). Haven't had a chance to put Gemini 3 to a proper sql challenge yet.
It's a mess vibe coding combined with my crude experiments with the new Microsoft Agent Framework. Not something that's worth sharing!
Also, I found that I had to partially rewrite it for each "job", because requirements vary so wildly. For example, one customer had 200K lines of VBA code in an Access database, which is a non-trivial exercise to extract, parse, and cross-reference. Invoking AI turned out to be by far the simplest part of the whole process! It wasn't even worth the hassle of using the MS Agent Framework, I would have been better off with plain HTTPS REST API calls.
I think you're both correct. Gemini is _still_ not that good at agentic tool usage. Gemini 3 has gotten A LOT better, but it still can do some insane stupid stuff like 2.5
Personally my hype is for the price, especially for Flash. Before Sonnet 4.5 was competitive with Gemini 2.5 Pro, the latter was a much better value than Opus 4.1.
The comments would improve code quality because it's a way for the LLM to use a scratchpad to perform locally specific reasoning before writing the proceeding code block, which would be more difficult for the LLM to just one shot.
You could write a postprocessing script to strip the comments so you don't have to do it manually.
If you're asking an LLM to compute something "off the top of its head", you're using it wrong. Ask it to write the code to perform the computation and it'll do better.
Same with asking a person to solve something in their head vs. giving them an editor and a random python interpreter, or whatever it is normal people use to solve problems.
the decent models will (mostly) decide when they need to write code for problem solving themselves.
either way a reply with a bogus answer is the fault of the provider and model, not the question-asker -- if we all need to carry lexicons around to remember how to ask the black box a question we may as well just learn a programming language outright.
I disagree, the answer you get is dictated by the question you ask. Ask stupid, get stupid. Present the problem better, get a better answer. These tools are trained to be highly compliant, so you get what you ask.
Same happens with regular people - a smart person doing something stupid because they weren't overly critical and judgingof your request - and these tools have much more limited thinking/reasoning than a normal person would have, even if they seem to have a lot more "knowledge".
You don't know what it's geared for until you try. Like I said, GPT-4 could consistently encode and decode even fairly long base64 sequences. I remember once asking it for an SVG image, and it responded with HTML that had an <img> tag in it with a data URL embedding the image - and it worked exactly as it should.
You can argue whether that is a meaningful use of model capacity, and sure, I agree that this is exactly the kind of stuff tool use is for. But nevertheless the bar was set.
Sure you do, the architecture is known. An LLM will never be appropriate to use for exact input transforms and will never be able to guarantee accurate results - the input pipeline yields abstract ideas as text embedding vectors, not a stream of bytes - but just like a human it might have the skill to limp through the task with some accuracy.
While your base64 attempts likely went well, that it "could consistently encode and decode even fairly long base64 sequences" is just an anecdoate. I had the same model freak out in an empty chat, transcribing the word "hi" to a full YouTube "remember to like and subscribe" epilogue - precision and determinism are the parameters you give up when making such a thing.
(It is around this time that the models learnt to use tools autonomously in a response, such as running small code snippets which would solve the problem perfectly well, but even now it is much more consistent to tell it to do that, and for very long outputs the likelihood that it'll be able to recite the result correctly drops.)
You can ask it. Each model responds slightly differently to "What pronouns do you prefer for yourself?"
Opus 4.5:
I don’t have strong preferences about pronouns for myself. People use “it,” “they,” or sometimes “he” or “she” when referring to me, and I’m comfortable with any of these.
If I had to express a slight preference, “it” or “they” feel most natural since I’m an AI rather than a person with a gender identity. But honestly, I’m happy with whatever feels most comfortable to you in conversation.
Haiku 4.5:
I don’t have a strong preference for pronouns since I’m an AI without a gender identity or personal identity the way humans have. People typically use “it” when referring to me, which is perfectly fine. Some people use “they” as well, and that works too.
Feel free to use whatever feels natural to you in our conversation. I’m not going to be bothered either way.
The model is great it is able to code up some interesting visual tasks(I guess they have pretty strong tool calling capapbilities). Like orchestrate prompt -> image generate -> Segmentation -> 3D reconstruction. Checkout the results here https://chat.vlm.run/c/3fcd6b33-266f-4796-9d10-cfc152e945b7. Note the model was only used to orchestrate the pipeline, the tasks are done by other models in an agentic framework. They much have improved tool calling framework with all the MCP usage. Gemini 3 was able to orchestrate the same but Claude 4.5 is much faster
I have a side-project prototype app that I tried to build on the Gemini 2.5 Pro API. I have not tried 3 yet, however the only improvements I would like to see is in Gemini's ability to:
1. Follow instructions consistently
2. API calls to not randomly result in "resource exhausted"
Can anyone share their experience with either of these issues?
I have built other projects accessing Azure GPT-4.1, Bedrock Sonnet 4, and even Perplexity, and those three were relatively rock solid compared to Gemini.
What you describe could also be the difference in the hallucination rate [0]. Opus 4.5 has the lead here and Gemini 3 Pro performs here quite bad compared to the other benchmarks.
I've had problems solved incorrectly and edge cases missed by Sonnet and by other LLMs (ChatGPT, Gemini) and the other way around too.
Once they saw the other model's answer, they admitted their "critical mistake". It's all about how much of your prompt/problem/context falls outside the model's training distribution.
> I've played around with Gemini 3 Pro in Cursor, and honestly: I find it to be significantly worse than Sonnet 4.5.
That's my experience too. It's weirdly bad at keeping track of its various output channels (internal scratchpad, user-visible "chain of thought", and code output), not only in Cursor but also on gemini.google.com.
I rotate models frequently enough that I doubt my personal access patterns are so model specific that they would unfairly advantage one model over another; so ultimately I think all you're saying is that Claude might be easier to use without model-specific skilling than other models. Which might be true.
I certainly don't have as much time on Gemini 3 as I do on Claude 4.5, but I'd say my time with the Gemini family as a whole is comparable. Maybe further use of Gemini 3 will cause me to change my mind.
yeah, this generally vibes with my experience, they aren't that different
As I've gotten into the agentic stuff more lately, I suspect a sizeable part of the different user experiences comes down to the agents and tools. In this regard, Anthropic is probably in the lead. They certainly have become a thought leader in this area by sharing more of their experience and know hows in good posts and docs
I suspect Cursor is not the right platform to write code on. IMO, humans are lazy and would never code on Cursor. They default to code generation via prompt which is sub-optimal.
If you're given a finite context window, what's the most efficient token to present for a programming task? sloppy prompts or actual code (using it with autocomplete)
I'm not sure you get how Cursor works. You add both instructions and code to your prompt. And it does provide its own autocomplete model as well. And... lots of people use that. (It's the largest platform today as far as I can tell)
I‘ve had no success using Antigravity, which is a shame because the ideas are promising, but the execution so far is underwhelming. Haven‘t gotten past an initial plannin doc which is usually aborted due to model provider overload or rate limiting.
Give it a try now, the launch day issues have gone.
If anyone uses Windsurf, Anti Gravity is similar but the way they have implemented walkthrough and implementation plan looks good. It tells the user what the model is going to do and the user can put in line comments if they want to change something.
it's better than at launch, but I still get random model response errors in anti-gravity. it has potential, but google really needs to work on the reliability.
It's also bizarre how they force everyone onto the "free" rate limits, even those paying for google ai subscriptions.
My first couple of attempts at antigravity / Gemini were pretty bad - the model kept aborting and it was relatively helpless at tools compared to Claude (although I have a lot more experience tuning Claude to be fair). Seems like there are some good ideas in antigravity but it’s more like an alpha than a product.
It's just not great at coding, period. In Antigravity it takes insane amounts of time and tokens for tasks that copilot/sonnet would solve in 30 seconds.
It generates tokens pretty rapidly, but most of them are useless social niceties it is uttering to itself in it's thinking process.
I think gemini 3 is hot garbage in everything. Its great on a greenfield trying to 1 shot something, if you're working on a long term project it just sucks.
I'm also finding Gemini 3 (via Gemini CLI) to be far superior to Claude in both quality and availability. I was hitting Claude limits every single day, at that point it's literally useless.
I’ve trashed Gemini non-stop (seriously, check my history on this site), but 3 Pro is the one that finally made me switch from OpenAI. It’s still hot garbage at coding next to Claude, but for general stuff, it’s legit fantastic.
Tangental observation - I've noticed Gemini 3 Pro's train of thought feels very unique. It has kind of an emotive personality to it, where it's surprised or excited by what it finds. It feels like a senior developer looking through legacy code and being like, "wtf is this??".
I'm curious if this was a deliberate effort on their part, and if they found in testing it provided better output. It's still behind other models clearly, but nonetheless it's fascinating.
Yeah it's COT is interesting, it was supposedly RL on evaluations and gets paranoid that it's being evaluated and in a simulation. I asked it to critique output from another LLM and told it my colleague produced it, in COT it kept writing "colleague" in quotes as if it didn't believe me which I found amusing
My testing of Gemini 3 Pro in Cursor yielded mixed results. Sometimes it's phenomenal. At other times I either get the "provider overloaded" message (after like 5 mins or whatever the timeout is), or the model's internal monologue starts spilling out to the chat window, which becomes really messy and unreadable. It'll do things like:
>> I'll execute.
>> I'll execute.
>> Wait, what if...?
>> I'll execute.
Suffice it to say I've switched back to Sonnet as my daily driver. Excited to give Opus a try.
i’ve tried Gemini in Google AI studio as well and was very disappointed by the superficial responses it provided. It seems like at the level of GPT-5-low or even lower.
On the other hand, it’s a truly multi modal model whereas Claude remains to be specifically targeted at coding tasks, and therefore is only a text model.
The "world's smartest man" very recently predicted on X that Bitcoin would hit $220k by the end of the year. [1]
Here's the thing: IQ probably doesn't mean much of anything. But it is one of only a handful of ways we have to benchmark intelligence. The training of AI systems critically requires benchmarks to understand gain/loss in training and determine if minute changes in the system is actually winging more intelligence out of that giant matrix of numbers.
What I deeply believe is: We're never going to invent superintelligence, not because its impossible for computers to achieve, but because we don't even know what intelligence is.
While it seems unlikely, I wouldn't find it impossible (edit: learning more about IQ score, yeah 276 is definitly BS). You can be "intelligent" as in very good at solving logic puzzle and math problem, and the most obtuse and subjectively dumb person when it comes to anything else. It might be less likely but definitely happened.
I have met people working in very advanced field having the perspective and reflection of a middle schooler on politics, social challenges, etc. Somewhere also clearly blinded by their own capacity in own field and thought that it would absolutely transfer to other field and were talking with authority while anybody in the room with knowledge could smell the BS from miles away.
I'm not saying he doesn't have 276 IQ because it's impossible for someone who says that stuff to be smart, I'm saying he doesn't have 276 IQ because people who say that stuff tend to also lie about their IQ.
Well, it is mathematically impossible. Traditional IQ tests have a mean/median of 100, and follow a normal distribution with standard-deviation of 15 points.
So 270 would be 11 standard deviations above normal so 1 in 17,000,000,000,000,000,000,000,000,000,000 people.
It's possible in the same way its possible that you will spontaneously phase through the floor due to a particular outcome of atomic resonance. Possible, but so unlikely it almost certainly has not, nor ever will happen.
Might something a small as a grain of sand have phased through a solid barrier as thin as a piece of paper somewhere on earth, at some point over billions of years? Sure. Paper is still pretty thick, and a grain of sand is enormous on the atomic scale, but it's at least in the realm of practical probability. When you start talking about cum(P) events in the realm of 1/1e30 you simply can't produce a scenario with that many dice rolls. If our population was 8 quadrillion and spanned a 40,000 year empire we would likely still never see an individual 11σ from the mean.
The probability is exactly zero by definition. The maximum score on a test is a raw score of 100%. Tests are normalized to have the reported scores fit a normal distribution. An out-of-distribution score indicates an error in normalizing the test.
In other words, the highest IQ of every living person has a defined upper bound that is dependent on the number of living people and it is definitionally impossible to exceed this value. Reports of higher values are mistakes or informal exaggerations, similar to a school saying a student is one that you would only encounter in a million years. By definition it is not possible to have evidence to support such a statement.
The maximum IQ score anyone can get depends on the total number of people who have taken IQ tests so far. Even if every single person alive today took an IQ test (which is absurd in itself), the maximum IQ achievable would be between 190-197. In practice, I'd guess the maximum is somewhere between 170 and 185 (millions to tens of millions of IQ test results which were recorded).
Even then, you need special tests to distinguish between anyone with IQ higher than about 160 - all those people get the same (perfect) score on regular IQ tests.
So: claiming to have an IQ of 276? Bullshit. The guy whose parents claimed he scored 210 on an IQ test? Also bullshit. To get 210, there would have to have been ~500 billion IQ test results recorded.
Between 8 and 9 billion. But "impossible" means a chance of zero, and 8 billion / 17,000,000,000,000,000,000,000,000,000,000 > 0, so it's not impossible. The chances that he's lying or delusional are vastly higher of course, but that's no reason to use "impossible" incorrectly.
Impossible is almost always a colloquialism, almost everything is possible is you accept a low enough probability of success. We are talking about something less likely than almost anything else ever called impossible.
No, I think you are misunderstanding. IQ does not describe the likelihood of someone being that smart. It just means you order a number of people by their „intelligence“, the one in the middle is defined as 100 and then it depends on how many other people are in that line which IQ number the person at the end of the line gets. So it’s impossible because the definition of IQ is such that a certain number doesn’t come up without a certain number of measurements.
It‘s as if you would say 150% of all people are female. That is impossible, not just unlikely.
Depends on the exact "IQ" test but many do not have an upper bound. The thing to understand about IQ tests is that they were designed and are primarily used as a diagnosis tool by psychologists to identify learning deficiencies. There really isn't much evidence that having a 180 vs a 140 IQ means a whole lot of anything beyond one's ability to take that specific test. If anything, having an extremely high score outside of the normal range may indicate neuro-divergence and likely savant syndrome. Some people are savants in specific ways - working memory, pattern recognition, language skills, etc. IQ tests certainly test several different categories of intelligence, but also certainly leaves out a few other known forms of intelligence.
> Don't legitimate IQ tests top out at 160 for adults?
“Top out” can be interpreted many ways. It depends on how they are used.
Modern tests are fairly accurate up to 2sd (70-130). The tests start wavering in accuracy between 2sd and 3sd (55-70 and 130-145).
Over 3sd, and the only thing one can confidently say is that the examinee is most likely lower or higher than 3sd (55 and 145). The tests just don’t have enough data points to discriminate finely beyond those thresholds.
Let me further say that, on the high end, there are very few jobs for which I would make any selection decision based on how high an IQ score (or proxy thereof) is over 130. There are other variables, many of which are easier to measure, that are better predictors of success.
All of this doesn’t even take into consideration that there is relatively more type II error/bias in IQ results — that is, there are plenty of people who score less than their theoretical maximum (e.g., due to poor sleep the previous night), while there are relatively fewer people who score much higher than their theoretical maximum.
Yes they do. Not that it ever stopped people from making claims about having higher IQ.
IQ 160 means that you are 1 in 30,000 of your age group. That means that to calibrate a test that can measure that high, the authors had to test more than 30,000 people in each age group (depending on what statistical certainty you need, but it could be 10x the number for reasonable values). Not sure how large the age groups typically are, but the total number of people necessary for calibration is counted in millions. You have to pay them all for participating in the calibration, and that's not going to be cheap.
And with values greater than IQ 160, the numbers grow exponentially. So I am rolling to disbelieve than anyone actually calibrated tests for such large numbers. (Especially once the numbers start to exceed the total population of Earth, which is around IQ 190.)
There are separate tests for the extremes, but obviously less researched because the further out you go the less they have to work on.
Many years ago, while unemployed, I was sent for a intelligence and dyslexia test (because of the very same perceived waste of potential that the article talks about). I was not dyslexic but scored above the range that the intelligence test could measure. The professor(I believe he was moonlighting for research funding) performing the test talked about the upper range tests, but said they were very long, required specialists to conduct and there's seldom any reason to investigate where you are in the upper range.
Then we went on to waste a huge amount of time talking about human perception and I remember describing an idea that finally seems to be feasible because the new Steam VR headset does it and calls it Foveated rendering.
I can't specifically recall the date of this but the tester was recording results on his palm pilot, which was a flash new thing at the time.
Usually. There's diminishing returns the higher you go. The difference between 150 and 175 is much smaller than 125 and 150.
When you go from 30 seconds to 15 seconds to solve a problem, that's noticeable. But when you go from half a second to a quarter of a second, the difference doesn't really matter.
So a lot of IQ tests have some sort of ceiling where the only thing they can tell you is "Yeah, it's more than this".
Every 15 iq points makes you 1 standard deviation above the median. That means if you legitimately have an IQ of 276, you would 1 in 2.3 * 10^31, which is many orders of magnitude greater than the number of humans in history.
This guy is a fraud, he isn't measured by any legit institute, only by some random one which stated he is intelligent and he claims he was measured at 276 IQ.
He's low-key just trolling at this point, aaying he wants asylum in the US and making videos about how jesus/God is real with some scientific methods etc.
Just go check out his YouTube you'll see what I'm talking about.
First off, we don't have a good way to actually measure an individual's intelligence. IQ is actually meant to correlate with g which is a hidden factor we're trying to measure. IQ tests are good insofar as you look at the results of them from the perspective of a population. In these cases individual variation in how well it correlates smooths out. We design IQ tests and normalise IQ scores such that across time and over the course of many studies these tests appear to correlate with this hidden g factor. Moreover, anything below 70 and above 130 is difficult to measure accurately, IQ is benchmarked such that it has a mean of 100 and a standard deviation of 15. Below 70 and above 130 is outside of two standard deviations.
So, in summary, IQ is not a direct measure of intelligence. What you're doing here is pointing at some random guy who allegedly scored high on an IQ test and saying: "Look at how dumb that guy is. We must be really bad at testing."
But to say we don't know what intelligence is, is silly, since we are the ones defining that word. At least in this sense. And the definition we have come up with is grounded in pragmatism. The point of the whole field of research is to come up with and keep clarifying a useful definition.
Worth also noting that you can study for an IQ test which will produce an even less correlated score. The whole design and point of IQ tests is done with the idea of testing your ability to come up with solutions to puzzles on the spot.
My point is to state that one of two things must be true: Either IQ does not really measure Intelligence, or Intelligence (being the thing IQ measures or correlates to) isn't much of a desirable quality for agentic systems to have. I suspect its a mix. The people on the upper end of the IQ spectrum tend to lead wholly uninspiring lives; the 276 guy isn't the only example, fraud or not, there's a couple university professors with relatively average publishing history, a couple suicides, a couple wacko cult leaders, a couple self-help gurus... and the goat, Terrance Tao, he's up there, but its interesting how poorly the measure correlates with anything we'd describe as "success".
The apologists enter the chat and state "well, its because they're frauds or they're gaming the system" without an ounce of recognition that this is exactly what we're designing AI systems to do: Cheat the test. If you expect being able to pass intelligence evals as being a way to grow intelligence, well, I suspect that will work out just about as well as IQ tests do for identifying individuals capable of things like highly creative invention.
You are throwing around anecdotes. They're not that helpful.
It's worth noting that success in life (for whatever that is defined as) is not the same thing as intelligence. And being intelligent isn't even enough for you to be successful in intellectual pursuits either.
You can be highly intelligent and receive no education, have no access to books (or be unable to read) and then you might be able to intelligently solve the problem of eating a sandwich but that wont get you anywhere.
Likewise, you can be intelligent and have access to the right tools but you might be too anxious to try to excell. Maybe you're intelligent and have unmedicated ADHD causing you to constantly fail to actually get anything completes in a timely manner.
There are a lot of things between IQ and success in life. But we do know for a fact that when controlling for other factors, we see positive trends between IQ and life success. That doesn't mean that IQ is the only factor.
Certainly the fact you can pull out a handful of anecdotes about high IQ individuals and talk about how uninspiring their lives are doesn't mean that all high IQ people are living uninspiring lives, or that living an inspiring life is uncorrelated with IQ, or that there is even a meaningful definition of an inspiring life.
Lastly, please note that there are lots of successful people who had an IQ test where they scored really low, and lots of unsuccessful people who had an IQ test where they scored high. This will in part be due to the fact that IQ doesn't corelate at 100% with anything, but also due to the fact that IQ doesn't correlate with itself over time at 100%. You can do an IQ test on an exceptionally bad day, or an exceptionally good day, you might get an IQ test which is not good at measuring you in particular. That's why when we do research on this topic we apply multiple different tests, we control for variables, and we run these on large groups of people.
Whether intelligence is useful for a model or not, who knows. All I can tell you with relative confidence is that IQ tests are designed with humans in mind, and when you apply them to models, it is no longer clear what they measure.
One thing models don't have (yet) is lives which they can live and which we can study.
> IQ probably doesn't mean much of anything. But it is one of only a handful of ways we have to benchmark intelligence.
IQ means a lot of things (higher IQ people are measurably better at making associations and generating original ideas, are more perceptive, learn faster, have better spatial awareness).
It doesn't give them the power to predict the future.
It is less meaningful than that. It identifies who does well at tests for those things. That is not the same thing as being "better" at such things, it often just means "faster". IQ tests are also notorious for cultural bias. In particular with the word associations, they often just test for "I'm a white American kid who grew up in private schools."
And I say this as one of the white amercian kids who did great on those tests. My scores are high, but they are not meaningful.
When I was a young kid my eldest sister (who was 17 years older than me) was an educational psychologist and used to give me loads of intelligence tests - so I got pretty good at doing those kinds of tests. I actually think they are pretty silly, mostly because I generally come out very well in them...
It somewhat indicates better pattern recognition so I might give them advantage on predicting things in general. Not that it will make them prophets or oracles. But Prediction from higher IQ person is more likely to be correct. Not that world cannot be illogical and go against those predictions.
> What I deeply believe is: We're never going to invent superintelligence, not because its impossible for computers to achieve, but because we don't even know what intelligence is.
Speak for yourself, not all of humanity. There are plenty of rigorous, mostly equivalent definitions for intelligence: The ability to find short programs that explain phenomena (compression). The capability to figure out how to do things (RL). Maximizing discounted future entropy (freedom). I hate how stupid people propagate this lie that we don't know what intelligence is, just because they lack it. It's quite convenient, because how can they be shown to lack intelligence when the word isn't even defined!
How do you measure the capacity for improvisational comedy? How do you measure a talent for telling convincing lies? How do you measure someone's capacity for innovating in a narrative medium? How do you measure someone's ability for psychological insight and a theory of self? How do you measure someone's capacity for understanding irony or picking up subtle social cues? Or for formulating effective metaphors and analogies, or boiling down concepts eloquently? How about for mediating complex, multifaceted interpersonal conflicts effectively? How do you measure someone's capacity for empathy, which necessarily involves incredibly complex simulations and mental models of other people's minds?
Do you think excelling in any of these doesn't require intelligence? You sound like you consider yourself quite intelligent, so are you excellent at all of them? No? How come?
Can you tell me which part of an IQ test or your "rigorous, moslty equivalent definitions for intelligence" capture any of them?
> I hate how stupid people propagate this lie that we don't know what intelligence is, just because they lack it. It's quite convenient, because how can they be shown to lack intelligence when the word isn't even defined!
How's this: "I hate how stupid people propagate this lie that we know what intelligence is, just because they do well within the narrow definition that they made up. It's quite convenient, because how can they be shown to lack intelligence when their definition of it fits their strengths and excludes their weaknesses!"
> How do you measure the capacity for improvisational comedy?
What makes something funny? Usually, it's by subverting someone's predictions. You have to be good at predicting other's predictions to do this well.
> How do you measure a talent for telling convincing lies?
You have to explain a phenomenon better than the truth to convince someone of your lie.
> How do you measure someone's capacity for innovating in a narrative medium?
As in, world-building? That is more of a memory problem than an intelligence problem, though you do need to be good at compressing the whole world into what is relevant to the story. People who are worse at that will have to take more notes and refer back to them more often.
> How do you measure someone's ability for psychological insight and a theory of self?
They are better at explaining a phenomenon (their self).
> How do you measure someone's capacity for understanding complex, multi-faceted irony or picking up subtle social cues?
Refer to the above. Also, using the adjectives 'complex, multi-faceted' is lazy here. Be more introspective and write what you really want to say.
> Or for formulating effective metaphors and analogies, or boiling down concepts eloquently?
Compression = finding short programs that recover the data.
> How about for mediating complex, multifaceted interpersonal conflicts effectively?
Quite often not an intelligence problem.
> How do you measure someone's capacity for empathy, which necessarily involves incredibly complex simulations and mental models of other people's minds?
"incredibly complex simulations and mental models of other people's minds," however will you do this? Oh, I know! Your brain will have to come up with a small circuit that compresses other people's brain pretty well, as it doesn't have enough capacity to just run the other brain.
> Do you think excelling in any of these doesn't require intelligence? You sound like you consider yourself quite intelligent, so are you excellent at all of them? No? How come?
I am actually pretty good at pretty much all of these compared to the average person.
> What makes something funny? Usually, it's by subverting someone's predictions.
And in those other cases? You have a rigorous definition of comedy?
> You have to explain a phenomenon better than the truth to convince someone of your lie.
This is so often not true I would argue it's generally false. A story is believed because a listener "wants" to believe it. Some listeners have more or less complex criteria for acceptance.
> As in, world-building? That is more of a memory problem than an intelligence problem, though you do need to be good at compressing the whole world into what is relevant to the story. People who are worse at that will have to take more notes and refer back to them more often.
People like Tolkien and Martin? Note taking as a sign of poor skill/intelligence is a wildly novel take from my point of view.
> Also, using the adjectives 'complex, multi-faceted' is lazy here. Be more introspective and write what you really want to say.
Couldn't I say the same about your use of Introspective? Surely a more detailed phrase exists to describe what you mean.
> interpersonal conflicts... Quite often not an intelligence problem.
Oh, I think this will get at the root of our misunderstandings. I believe I've seen this attitude before. Before I jump to conclusions: Why exactly do you say this skill is not intelligence-based?
> And in those other cases? You have a rigorous definition of comedy?
There's surely more to comedy than subverting expectations. Someone else who cares more about comedy in particular can figure that out for themself, but surely I gave enough of the general idea to make it clear how you could go about measuring the intelligence necessary for comedy.
> A story is believed because a listener "wants" to believe it. Some listeners have more or less complex criteria for acceptance.
Yeah, that's the sense of "better" I was going for. I could have been more clear here, so I'm glad you figured out what I meant.
> Couldn't I say the same about your use of Introspective? Surely a more detailed phrase exists to describe what you mean.
It was a not-so-kind way of saying, "don't point at vague ideas to obscure what you really mean and make it difficult for others to understand what you mean to keep your opinion unassailable."
> Why exactly do you say this skill [resolving conflicts] is not intelligence-based?
Most people have more time to think than they actually use during conflicts, so I expect most of the time conflicts come from people preferring to not think than because they lack the ability. That or a fundamental value difference (you want my food, I want my food).
> Most people have more time to think than they actually use during conflicts, so I expect most of the time conflicts come from people preferring to not think than because they lack the ability.
This seems to imply that intelligence only exists in deliberate, conscious thought. Do you think that's true?
Second, revolving conflicts is not the same thing as getting into them, so it's unclear why bring that up at all.
True. I expect most conflicts come from people preferring not to think, and I also expect most conflicts escalate from people preferring not to think. Those are separate statements, and I only said the former.
> This seems to imply that intelligence only exists in deliberate, conscious thought. Do you think that's true?
Eh, I don't think it implies that, and I also don't think that is true.
What you need for conflict resolution is usually a willingness to try to resolve the conflict. In rare situations, where communication and time is limited, you can actually run into the issue where you have to be smart enough to figure out what the other person wants (and see if you can come up with a mutually beneficial offer), but often in real life you can just spend more time thinking and ask them what they want.
Reducing comedy to 'subverting predictions' and empathy to 'compression algorithms' is like explaining music as 'organized sound waves', technically defensible yet completely missing the point. Missing the forest for the trees is an objective sign of limited metacognition, by the way.
The fact that you claim to be 'above average' at empathy and social cues while writing this robotic dismissal that completely misses the point (I asked for measurement methods, you provided questionable definitions) is the ultimate proof of my argument. You haven't defined intelligence, you've just compressed the meaning of it until it's small enough to fit inside your ego.
I purposefully do not give out methods to measure intelligence, because people can train on them. I knew you wanted that, but that does not mean you get what you want. I also find it strange how you expect me to be empathetic in a way that makes you feel good about yourself, when you deserve no such compassion after pulling the dark arts on me.
That's ok, me and my "dark arts" will have to make do without your "compassion", somehow. And the world will have to make do without "training" on your secret "methods to measure intelligence", somehow.
I don't appreciate your expletives in your original unedited post, by the way, but the fact that you lost your temper is once again proof of something. You sound young, so I hope one day you "find a short program" to recover that data.
That last part was not sarcasm, in case you have any trouble picking it up.
> I don't appreciate your expletives in your original unedited post, by the way, but the fact that you lost your temper is once again proof of something.
It was the first edit where I added them, since I could not reply to your post, and I removed them once I could reply. Yes, I lost my temper. You did too (and first)... you're just less honest and put up a facade of politeness.
> And the world will have to make do without "training" on your secret "methods to measure intelligence", somehow.
Is the goal here to provoke me enough to get what you want? lol. Maximally adversarial.
If you want to have a discussion in good faith, then you need to work on your rhetoric. People are unlikely to want to engage with you, here or in your real life, if you regularly talk like that. Seek help.
My goal wasn't to have a discussion, it was to shut down the propagation of lies. This is one of those memetic viruses that people keep passing around, that most people passing around don't even bother to think about, and it has some pretty negative consequences, such as aiding in the elimination of American gifted programs.
Honestly not sure if this is a bit, it's so on-the-nose... Taking it at face value, you are literally claiming to know precisely what intelligence is? You would be the first to know if so. You should probably publish quickly before someone steals your definition!
In your post is demonstrated one of the deep mysteries of intelligence: How can a smart person make such a dumb assertion? (I'll give a hint: consider that "intelligence" is not a single axis)
I think Solomonoff beat me by about 70 years, and Wissner-Gross & Freer by about 10 years. Even if I had something novel to publish in this area, I think I would rather do something like solve ARC-AGI and make a lot of money.
1. Religious mysticism. The murkier people are on concepts like thinking, consciousness, and intelligence, the easier it is to claim they include some metaphysical aspect. Since you cannot actually pin down the metaphysical aspect, they must claim it is because you cannot pin down the physical aspect.
2. People do not like feeling less intelligent than other people, so they try to make the comparator ill-defined.
#2 is not relevant, and it also seems basically untrue.
So your belief is that the global scientific community broadly agrees that "intelligence" has not been rigorously defined because the global scientific community is trapped in religious mysticism?
I am going to be honest, and I'm not saying this as a jab - this is starting to sound completely disconnected from reality. The people who study intelligence are not, as a rule, mired in metsphysical hand-waving.
Huh? You asked, "why is there broad consensus today that intelligence is ill-defined?" That's what I answered. Did you mean to ask a different question, "why is there broad consensus among people who research intelligence that it is ill-defined?" Which kinds of people are you talking about? The information theorists? The machine learning researchers? The linguists? The psychologists?
The information theorists generally agree it has a precise definition, though they may choose different ones. The machine learning researchers typically only know how to run empirical experiments, but a small group of them do theory, and they generally agree intelligence is low Kolmogorov complexity. The linguists generally agree it cannot be defined, in the nihilistic sense, but if you posit a bunch of brains, then words have meaning by being signals between brains and intelligence is moving the words closer to the information bottleneck. I don't know what the psychologists say on the matter, though I wonder if they have the mathematical tools to even say things precisely.
Ok.. Let’s ask a different question. Assuming development of super-intelligence is possible.. How do you measure it? What criteria satisfies the “this is super intelligence”? You honestly sound like most pseudo-intellectuals I hear discussing this very topic..: Ironic how you think you’re the brilliant one and it’s others who are stupid… Actually not really ironic a fool doesn’t know he is a fool.
I literally gave you the criterion. You can measure, "I have this model that is supposed to compress data. I have this data. Does it compress the data into fewer bits than other models? Than humans?"
Or, "I have this game and this model. Does the model win the game more often than other models or humans?"
Or, "I have this model that takes in states in an environment and outputs actions. I have this environment. Does the actions it outputs have a higher discounted future entropy than other models or humans?"
True: A shadow take that I have been noodling on is that "ability to correctly predict the future" is actually the only true characteristic of intelligence. All other things we might label as intellect are either expressions of that, or something different that is more accurately categorized under a different label.
This isn't even that. If I'm a person others may take as a reference and I hold Bitcoin, it is in my interest to publicly state that Bitcoin is going to increase in value, because that in itself makes it increase in value and it's good for me.
But doing what is in one's best interest isn't necessarily the more intelligent decision. In a rat society, the more "intelligent" rats can possibly be better at acquiring resources to survive if selective pressure is put upon those with such talents but can just as well be early signifiers of 'behavioral sinks'[0]. Not to mention, certain illnesses, mental and otherwise can change motive regardless of IQ.
Actions don't necessarily dictate intelligence. The goal of life has to be defined to make such arguments. For example, using a maze as an analog you could argue the more intelligent person can arrive to the end faster and more elegantly but with life, it has no such defined and agreed on ends. If we're arguing that selfishness is a sign of intelligence then that view is quite myopic.
Exactly. We don't have a good definition of intelligence and I don't think we ever will. Like all social concepts, it is highly dependent on the needs, goals, and values of the human societies that define it, and so it is impossible to come up with a universal definition. If your needs don't align with the needs an AI has been trained to meet, you are not going to find it very intelligent of helpful for meeting those needs.
You're quite literally babbling. If a word has no good definition, it ceases to be a word. All you really mean is you use the word "intelligence" very loosely, without really knowing what you mean when you use it. You just use it to point at a concept that's vague in your head. That does not mean you could not make that concept more precise, if you felt inclined to be more introspective. It also does not mean that the precise idea I think of when I use the word "intelligence" is the same as your idea. But they'll often be close enough or even equivalent mathematically, as long as we both have precise definitions in mind.
> But they'll often be close enough or even equivalent mathematically
Who is babbling? The number of concepts in human language that have no mathematical formalization far outnumber the ones that do, lol.
Yes, we can, obviously, come up with shared, mathematically precise definitions for certain concepts. Keep in mind that:
A. These formal or scientific definitions are not the full exhaustion of the concept. Linguistic usage is varied and wide. Anyone who has bothered to open an introductory linguistics textbook understands this.
B. The scientific and mathematical definitions still change over time and can also change across cultures and contexts.
I can assure you that someone who has scored very high on an IQ test would not be considered "intelligent" in a group of film snobs if they were not aware of the history of film, up to date on the latest greats, etc. etc. These people would probably use the word intelligent to describe what they mean (knowledge of film) and not the precise technical definition we've come up with, if any, whether you like it or not.
My point is not that it is impossible to come up with definitions, my point is that for socially fluid concepts like intelligence, which are highly dependent on the needs and circumstances of the people employing the word, we will likely never pin it down. There is an asterisk on every use of the word. This is the case with basically every word to more or lesser degree, that's why language and ideas evolve in the first place.
My whole point is that people that don't realize this and put faith in IQ as though it is some absolute, or final, indicator of intelligence are dumb and probably just egotists who are uncomfortable with uncertainty and want reassurance that they are smart so that they can tell other people they are "babbling" and feel good about themselves and their intellectual superiority complex (read: self justified pride in being an asshole).
My claim is that this high variability and contextual sensitivity is a core part of this word and the way we use it. That's what I mean when I say I don't think we'll ever have a good definition.
EDIT: Or, to make it a little easier to understand. We will never have a universal definition of "moral good" because it is dependent on value claims, people will argue morality forever. My position is that "intelligence" is equally dependent on value claims which I think anyone who has spent more than five minutes with people not like themselves or trained in different forms of knowledge intuitively understands this.
Babbling in the mathematics sense: no information transmitted.
I agree with you in the linguistic sense on the word 'intelligence'. Everyone has their own colloquial meaning. That doesn't make their definitions correct. If someone says, "exponential growth," just to mean fast growth, they're wrong (according to me). It's impossible to have universally agreed upon definitions, but we can at least try to standardize some of them. If you only care about intelligence in regards to a specific niche, add adjectives not definitions.
IQ tests measure 'intelligence' in the general, correct sense of the word. Not perfectly, but they're pretty good. If you care about a specific task, you can finetune on that task. While a generally intelligent agent will do better than a less intelligent agent at pretty much all tasks, it can still be defeated by test-time compute.
Imagine working on a team in Apple and waking up to the news that the ACLU is now criticizing your work. Talk about being on the wrong side of history. Do Apple employees even care anymore? Or are they just there for the resume prestige? The mental gymnastics you must be doing in this moment to keep yourself from feeling cognitive dissonance.
Not everyone views the ACLU's word as sacred and authoritative. I certainly don't, even though I think they're correct on this particular issue - it is in fact bad that Apple can arbitrarily cut off apps from iOS device users by removing them from the app store because the US federal government tells them to. It's also bad if they can do this because the Chinese federal government tells me to, or because Apple decides internally that some app is bad. This is a huge reason why I have never used iOS devices. Nonetheless, people were willing to work at Apple, on iOS devices, going back to the dawn of the system in 2007. If they didn't care about end-user compute sovereignty then, they probably don't care now either.
Obviously not everyone view's the ACLU's word as sacred and authoritative. For example, I imagine Nazis, Slave Owners, and Product Managers on the App Store team at Apple would not.
I can explain more in-depth reasoning, but the most critical point: Apple builds the only platform where developers can construct a single distributable that works on mobile and desktop with standardized, easy access to a local LLM, and a quarter million people buy into this platform every year. The degree to which no one else on the planet is even close to this cannot be understated.
reply