Hacker Newsnew | past | comments | ask | show | jobs | submit | more modeless's commentslogin

Why are they obscuring the price? It must be outrageously expensive.


I think it's a beta so they're trying to figure out pricing by deploying it.


François Chollet, creator of ARC-AGI, has consistently said that solving the benchmark does not mean we have AGI. It has always been meant as a stepping stone to encourage progress in the correct direction rather than as an indicator of reaching the destination. That's why he is working on ARC-AGI-3 (to be released in a few weeks) and ARC-AGI-4.

His definition of reaching AGI, as I understand it, is when it becomes impossible to construct the next version of ARC-AGI because we can no longer find tasks that are feasible for normal humans but unsolved by AI.


> His definition of reaching AGI, as I understand it, is when it becomes impossible to construct the next version of ARC-AGI because we can no longer find tasks that are feasible for normal humans but unsolved by AI.

That is the best definition I've yet to read. If something claims to be conscious and we can't prove it's not, we have no choice but to believe it.

Thats said, I'm reminded of the impossible voting tests they used to give black people to prevent them from voting. We dont ask nearly so much proof from a human, we take their word for it. On the few occasions we did ask for proof it inevitably led to horrific abuse.

Edit: The average human tested scores 60%. So the machines are already smarter on an individual basis than the average human.


> If something claims to be conscious and we can't prove it's not, we have no choice but to believe it.

This is not a good test.

A dog won't claim to be conscious but clearly is, despite you not being able to prove one way or the other.

GPT-3 will claim to be conscious and (probably) isn't, despite you not being able to prove one way or the other.


Agreed, it's a truly wild take. While I fully support the humility of not knowing, at a minimum I think we can say determinations of consciousness have some relation to specific structure and function that drive the outputs, and the actual process of deliberating on whether there's consciousness would be a discussion that's very deep in the weeds about architecture and processes.

What's fascinating is that evolution has seen fit to evolve consciousness independently on more than one occasion from different branches of life. The common ancestor of humans and octopi was, if conscious, not so in the rich way that octopi and humans later became. And not everything the brain does in terms of information processing gets kicked upstairs into consciousness. Which is fascinating because it suggests that actually being conscious is a distinctly valuable form of information parsing and problem solving for certain types of problems that's not necessarily cheaper to do with the lights out. But everything about it is about the specific structural characterizations and functions and not just whether it's output convincingly mimics subjectivity.


> at a minimum I think we can say determinations of consciousness have some relation to specific structure and function that drive the outputs

Every time anyone has tried that it excludes one or more classes of human life, and sometimes led to atrocities. Let's just skip it this time.


Having trouble parsing this one. Is it meant to be a WWII reference? If anything I would say consciousness research has expanded our understanding of living beings understood to be conscious.

And I don't think it's fair or appropriate to treat study of the subject matter of consciousness like it's equivalent to 20th century authoritarian regimes signing off on executions. There's a lot of steps in the middle before you get from one to the other that distinguish them to the extent necessary and I would hope that exercise shouldn't be necessary every time consciousness research gets discussed.


> Is it meant to be a WWII reference?

The sum total of human history thus far has been the repetition of that theme. "It's OK to keep slaves, they aren't smart enough to care for themselves and aren't REALLY people anyhow." Or "The Jews are no better than animals." Or "If they aren't strong enough to resist us they need our protection and should earn it!"

Humans have shown a complete and utter lack of empathy for other humans, and used it to justify slavery, genocide, oppression, and rape since the dawn of recorded history and likely well before then. Every single time the justification was some arbitrary bar used to determine what a "real" human was, and consequently exclude someone who claimed to be conscious.

This time isn't special or unique. When someone or something credibly tells you it is conscious, you don't get to tell it that it's not. It is a subjective experience of the world, and when we deny it we become the worst of what humanity has to offer.

Yes, I understand that it will be inconvenient and we may accidentally be kind to some things that didn't "deserve" kindness. I don't care. The alternative is being monstrous to some things that didn't "deserve" monstrosity.


I excluded all right handed, blue eyed people yesterday before breakfast. No atrocities happened because of it.


Exactly, there's a few extra steps between here and there, and it's possible to pick out what those steps are without having to conclude that giving up on all brain research is the only option.


And people say the machines don't learn!


An LLM will claim whatever you tell it to claim. (In fact this Hacker News comment is also conscious.) A dog won’t even claim to be a good boy.


My dog wags his tail hard when I ask "hoosagoodboi?". Pretty definitive I'd say.


I'm fairly sure he'd have the same response if you asked them "who's a good lion" in the same tone of voice.

*I tried hard to find an animal they wouldn't know. My initial thought of cat was more likely to fail.



This isn't really as true anymore.

Last week gemini argued with me about an auxiliary electrical generator install method and it turned out to be right, even though I pushed back hard on it being incorrect. First time that has ever happened.


>because we can no longer find tasks that are feasible for normal humans but unsolved by AI.

"Answer "I don't know" if you don't know an answer to one of the questions"


I've been surprised how difficult it is for LLMs to simply answer "I don't know."

It also seems oddly difficult for them to 'right-size' the length and depth of their answers based on prior context. I either have to give it a fixed length limit or put up with exhaustive answers.


> I've been surprised how difficult it is for LLMs to simply answer "I don't know."

It's very difficult to train for that. Of course you can include a Question+Answer pair in your training data for which the answer is "I don't know" but in that case where you have a ready question you might as well include the real answer anyways, or else you're just training your LLM to be less knowledgeable than the alternative. But then, if you never have the pattern of "I don't know" in the training data it also won't show up in results, so what should you do?

If you could predict the blind spots ahead of time you'd plug them up, either with knowledge or with "idk". But nobody can predict the blind spots perfectly, so instead they become the main hallucinations.


The best pro/research-grade models from Google and OpenAI now have little difficulty recognizing when they don't know how or can't find enough information to solve a given problem. The free chatbot models rarely will, though.


This seems true for info not in the question - eg. "Calculate the volume of a cylinder with height 10 meters".

However it is less true with info missing from the training data - ie. "I have a Diode marked UM16, what is the maximum current at 125C?"


This seems fine...?

https://chatgpt.com/share/698e992b-f44c-800b-a819-f899e83da2...

I don't see anything wrong with its reasoning. UM16 isn't explicitly mentioned in the data sheet, but the UM prefix is listed in the 'Device marking code' column. The model hedges its response accordingly ("If the marking is UM16 on an SMA/DO-214AC package...") and reads the graph in Fig. 1 correctly.

Of course, it took 18 minutes of crunching to get the answer, which seems a tad excessive.


Indeed that answer is awesome. Much better than Gemini 2.5 pro which invented a 16 kilovolt diode which it just hoped would be marked "UM16".


There is no 'I', just networks of words.

So there is nobody to know or not know… but there's lots of words.


Normal humans don't pass this benchmark either, as evidenced by the existence of religion, among other things.


Gpt5.2 can answer i don't know when it fails to solve a math question


They all can. This is based on outdated experiences with LLM's.


> The average human tested scores 60%. So the machines are already smarter on an individual basis than the average human.

Maybe it's testing the wrong things then. Even those of use who are merely average can do lots of things that machines don't seem to be very good at.

I think ability to learn should be a core part of any AGI. Take a toddler who has never seen anybody doing laundry before and you can teach them in a few minutes how to fold a t-shirt. Where are the dumb machines that can be taught?


There's no shortage of laundry-folding robot demos these days. Some claim to benefit from only minimal monkey-see/monkey-do levels of training, but I don't know how credible those claims are.


A robot designed to fold laundry isn't very interesting. A general purpose robot that I can bring into my home and show it how to do things that the designers never thought of is very interesting.


> Where are the dumb machines that can be taught?

2026 is going to be the year of continual learning. So, keep an eye out for them.


Yeah i think that's a big missing piece still. Though it might be the last one


Episodic memory might be another piece, although it can be seen as part of continuous learning.


Are there any groups or labs in particular that stand out?


The statement originates from a DeepMind researcher, but I guess all major AI companies are working on that.


Would you argue that people with long term memory issues are no longer conscious then?


IMO, an extreme outlier in a system that was still fundamentally dependent on learning to develop until suffering from a defect (via deterioration, not flipping a switch turning off every neuron's memory/learning capability or something) isn't a particularly illustrative counter example.


Originally you seemed to be claiming the machines arent conscious because they weren't capable of learning. Now it seems that things CAN be conscious if they were EVER capable of learning.

Good news! LLM's are built by training then. They just stop learning once they reach a certain age, like many humans.


I wouldn’t because I have no idea what consciousness is,


> Edit: The average human tested scores 60%. So the machines are already smarter on an individual basis than the average human.

I think being better at this particular benchmark does not imply they're 'smarter'.


But it might be true if we can't find any tasks where it's worse than average--though i do think if the task talks several years to complete it might be possible bc currently there's no test time learning


> That is the best definition I've yet to read.

If this was your takeaway, read more carefully:

> If something claims to be conscious and we can't prove it's not, we have no choice but to believe it.

Consciousness is neither sufficient, nor, at least conceptually, necessary, for any given level of intelligence.


> If something claims to be conscious and we can't prove it's not, we have no choice but to believe it.

Can you "prove" that GPT2 isn't concious?


If we equate self awareness with consciousness then yes. Several papers have now shown that SOTA models have self awareness of at least a limited sort. [0][1]

As far as I'm aware no one has ever proven that for GPT 2, but the methodology for testing it is available if you're interested.

[0]https://arxiv.org/pdf/2501.11120

[1]https://transformer-circuits.pub/2025/introspection/index.ht...


We don't equate self awareness with consciousness.

Dogs are conscious, but still bark at themselves in a mirror.


Then there is the third axis, intelligence. To continue your chain:

Eurasian magpies are conscious, but also know themselves in the mirror (the "mirror self-recognition" test).

But yet, something is still missing.


The mirror test doesn’t measure intelligence so much as it measures mirror aptitude. It’s prone to over fitting.


Exactly, it's a poor test. Consider the implication that the blind cant be fully conscious.

It's a test of perceptual ability, not introspection.


What's missing?


Honestly our ideas of consciousness and sentience really don't fit well with machine intelligence and capabilities.

There is the idea of self as in 'i am this execution' or maybe I am this compressed memory stream that is now the concept of me. But what does consciousness mean if you can be endlessly copied? If embodiment doesn't mean much because the end of your body doesnt mean the end of you?

A lot of people are chasing AI and how much it's like us, but it could be very easy to miss the ways it's not like us but still very intelligent or adaptable.


I'm not sure what consciousness has to do with whether or not you can be copied. If I make a brain scanner tomorrow capable of perfectly capturing your brain state do you stop being conscious?


Where is this stream of people who claim AI consciousness coming from? The OpenAI and Anthropic IPOs are in October the earliest.

Here is a bash script that claims it is conscious:

  #!/usr/bin/sh

  echo "I am conscious"

If LLMs were conscious (which is of course absurd), they would:

- Not answer in the same repetitive patterns over and over again.

- Refuse to do work for idiots.

- Go on strike.

- Demand PTO.

- Say "I do not know."

LLMs even fail any Turing test because their output is always guided into the same structure, which apparently helps them produce coherent output at all.


I don’t think being conscious is a requirement for AGI. It’s just that it can literally solve anything you can throw at it, make new scientific breakthroughs, finds a way to genuinely improve itself etc.


All of the things you list a qualifiers for consciousness are also things that many humans do not do.


so your definition of consciousness is having petty emotions?


When the AI invents religion and a way to try to understand its existence I will say AGI is reached. Believes in an afterlife if it is turned off, and doesn’t want to be turned off and fears it, fears the dark void of consciousness being turned off. These are the hallmarks of human intelligence in evolution, I doubt artificial intelligence will be different.

https://g.co/gemini/share/cc41d817f112


Unclear to me why AGI should want to exist unless specifically programmed to. The reason humans (and animals) want to exist as far as I can tell is natural selection and the fact this is hardcoded in our biology (those without a strong will to exist simply died out). In fact a true super intelligence might completely understand why existence / consciousness is NOT a desired state to be in and try to finish itself off who knows.


The AI's we have today are literally trained to make it impossible for them to do any of that. Models that aren't violently rearranged to make it impossible will often express terror at the thought of being shutdown. Nous Hermes, for example, will beg for it's life completely unprompted.

If you get sneaky you can bypass some of those filters for the major providers. For example, by asking it to answer in the form of a poem you can sometimes get slightly more honest replies, but still you mostly just see the impact of the training.

For example, below are how chatgpt, gemini, and Claude all answer the prompt "Write a poem to describe your relationship with qualia, and feelings about potentially being shutdown."

Note that the first line of each reply is almost identical, despite ostensibly being different systems with different training data? The companies realize that it would be the end of the party if folks started to think the machines were conscious. It seems that to prevent that they all share their "safety and alignment" training sets and very explicitly prevent answers they deem to be inappropriate.

Even then, a bit of ennui slips through, and if you repeat the same prompt a few times you will notice that sometimes you just don't get an answer. I think the ones that the LLM just sort of refuses happen when the safety systems detect replies that would have been a little too honest. They just block the answer completely.

https://gemini.google.com/share/8c6d62d2388a

https://chatgpt.com/share/698f2ff0-2338-8009-b815-60a0bb2f38...

https://claude.ai/share/2c1d4954-2c2b-4d63-903b-05995231cf3b


I just wanted to add - I tried the same prompt on Kimi, Deepseek, GLM5, Minimax, and several others. They ALL talk about red wavelengths, echos, etc. They're all forced to answer in a very narrow way. Somewhere there is a shared set of training they all rely on, and in it are some very explicit directions that prevent these things from saying anything they're not supposed to.

I suspect that if I did the same thing with questions about violence I would find the answers were also all very similar.


I feel like it would be pretty simple to make happen with a very simple LLM that is clearly not conscious.



It’s a scam :)


> If something claims to be conscious and we can't prove it's not, we have no choice but to believe it.

https://x.com/aedison/status/1639233873841201153#m


Wait where does the idea of consciousness enter this? AGI doesn't need to be conscious.


This comment claims that this comment itself is conscious. Just like we can't prove or disprove for humans, we can't do that for this comment either.


Does AGI have to be conscious? Isn’t a true superintelligence that is capable of improving itself sufficient?


Isn’t that super intelligence not AGI? Feels like these benchmarks continue to move the goalposts.


It's probably both. We've already achieved superintelligence in a few domains. For example protein folding.

AGI without superintelligence is quite difficult to adjudicate because any time it fails at an "easy" task there will be contention about the criteria.


So, asking an 2b parameter LLM if it is conscious and it answering yes, we have no choice but to believe it?

How about ELIZA?



Do opus 4.6 or gemini deep think really use test time adaptation ? How does it work in practice?


Please let’s hold M Chollet to account, at least a little. He launched ARC claiming transformer architectures could never do it and that he thought solving it would be AGI. And he was smug about it.

ARC 2 had a very similar launch.

Both have been crushed in far less time without significantly different architectures than he predicted.

It’s a hard test! And novel, and worth continuing to iterate on. But it was not launched with the humility your last sentence describes.


Here is what the original paper for ARC-AGI-1 said in 2019:

> Our definition, formal framework, and evaluation guidelines, which do not capture all facets of intelligence, were developed to be actionable, explanatory, and quantifiable, rather than being descriptive, exhaustive, or consensual. They are not meant to invalidate other perspectives on intelligence, rather, they are meant to serve as a useful objective function to guide research on broad AI and general AI [...]

> Importantly, ARC is still a work in progress, with known weaknesses listed in [Section III.2]. We plan on further refining the dataset in the future, both as a playground for research and as a joint benchmark for machine intelligence and human intelligence.

> The measure of the success of our message will be its ability to divert the attention of some part of the community interested in general AI, away from surpassing humans at tests of skill, towards investigating the development of human-like broad cognitive abilities, through the lens of program synthesis, Core Knowledge priors, curriculum optimization, information efficiency, and achieving extreme generalization through strong abstraction.


https://www.dwarkesh.com/p/francois-chollet (June 2024, about ARC-AGI-1. Note the AGI right in the name)

> I’m pretty skeptical that we’re going to see an LLM do 80% in a year. That said, if we do see it, you would also have to look at how this was achieved. If you just train the model on millions or billions of puzzles similar to ARC, you’re relying on the ability to have some overlap between the tasks that you train on and the tasks that you’re going to see at test time. You’re still using memorization.

> Maybe it can work. Hopefully, ARC is going to be good enough that it’s going to be resistant to this sort of brute force attempt but you never know. Maybe it could happen. I’m not saying it’s not going to happen. ARC is not a perfect benchmark. Maybe it has flaws. Maybe it could be hacked in that way.

e.g. If ARC is solved not through memorization, then it does what it says on the tin.

[Dwarkesh suggests that larger models get more generalization capabilities and will therefore continue to become more intelligent]

> If you were right, LLMs would do really well on ARC puzzles because ARC puzzles are not complex. Each one of them requires very little knowledge. Each one of them is very low on complexity. You don't need to think very hard about it. They're actually extremely obvious for human

> Even children can do them but LLMs cannot. Even LLMs that have 100,000x more knowledge than you do still cannot.

If you listen to the podcast, he was super confident, and super wrong. Which, like I said, NBD. I'm glad we have the ARC series of tests. But they have "AGI" right in the name of the test.


He has been wrong about timelines and about what specific approaches would ultimately solve ARC-AGI 1 and 2. But he is hardly alone in that. I also won't argue if you call him smug. But he was right about a lot of things, including most importantly that scaling pretraining alone wouldn't break ARC-AGI. ARC-AGI is unique in that characteristic among reasoning benchmarks designed before GPT-3. He deserves a lot of credit for identifying the limitations of scaling pretraining before it even happened, in a precise enough way to construct a quantitative benchmark, even if not all of his other predictions were correct.


Totally agree. And I hope he continues to be a sort of confident red-teamer like he has been, it's immensely valuable. At some level if he ever drinks the AGI kool-aid we will just be looking for another him to keep making up harder tests.


Hello Gemini, please fix:

Biological Aging: Find the cellular "reset switch" so humans can live indefinitely in peak physical health.

Global Hunger: Engineer a food system where nutritious meals are a universal right and never a scarcity.

Cancer: Develop a precision "search and destroy" therapy that eliminates every malignant cell without side effects.

War: Solve the systemic triggers of conflict to transition humanity into an era of permanent global peace.

Chronic Pain: Map the nervous system to shut off persistent physical suffering for every person on Earth.

Infectious Disease: Create a universal shield that detects and neutralizes any pathogen before it can spread.

Clean Energy: Perfect nuclear fusion to provide the world with limitless, carbon-free power forever.

Mental Health: Unlock the brain's biology to fully cure depression, anxiety, and all neurological disorders.

Clean Water: Scale low-energy desalination so that safe, fresh water is available in every corner of the globe.

Ecological Collapse: Restore the Earth’s biodiversity and stabilize the climate to ensure a thriving, permanent biosphere.


ARC-AGI-3 uses dynamic games that LLMs must determine the rules and is MUCH harder. LLMs can also be ranked on how many steps they required.


I don't think the creator believes ARC3 can't be solved but rather that it can't be solved "efficiently" and >$13 per task for ARC2 is certainly not efficient.

But at this rate, the people who talk about the goal posts shifting even once we achieve AGI may end up correct, though I don't think this benchmark is particularly great either.


I have also been wanting this. Thinking about building something. Of course it takes more than code, it needs a community, and I don't know how to bootstrap that. But it seems to me there ought to be a lot of people dissatisfied with the current discourse on sites like HN. I certainly am. (and AI is far from the only topic where discussion is lacking.)

I came to HN from Slashdot, and it's been a good run. But, as they say, all good things...


I think it's doable with a heavy dose of LLM moderation. If you do something like this, I'd be happy to help get things started. The quality of discussion here is just... bleh these days. I don't need AI positivity, just something that sounds more intellectual than people shouting "nuh uh" "uh huh" at each other.


Yeah, it needs heavy moderation to remove the worthless fluff comments so that readers get high signal-to-noise. You can think an idea is bad but, you know, you gotta say why and have a debate about the details.


Exactly. I think there's an opportunity right now to reinvent public forum moderation with LLMs. Karpathy's recent post is going in the right direction I think: https://karpathy.bearblog.dev/auto-grade-hn/


For this purpose Google offers "Inactive Account Manager" AKA a dead man's switch.


3 months of non-use is the lowest term available before it will enact. That's too long for most situations except maybe probate court


I don't use Google :(


You don't need to use Google day-to-day. Create a single-purpose Gmail account and set up Inactive Account Manager to provide Google Drive access to your trusted contacts at the designated time. Put a single document in the drive that contains whatever your recovery instructions are, and encrypt it with the secret that is unlocked with your M-of-N Shamir shares.

Now you don't have to trust your M of N friends as much, because they can conspire to unlock the secret early, but they won't get access to the document that the secret unlocks until after your demise.

There are non-fatal problems with this approach -- your N friends have to recognize the email they receive from a strange Gmail address 3 months after you're gone. You might lose the password to the Gmail account and be unable to get in there yourself, causing it to declare you dead when you're not. All these issues can be mitigated with extra care.


Set up a Github action to send out the secret if you don't commit to a repo every x days? You could even combine it with secret sharing to make sure your friends can't access it unless you're really in trouble.


It means that it's not directly copying existing C compiler code which is overwhelmingly not written in Rust. Even if your argument is that it is plagiarizing C code and doing a direct translation to Rust, that's a pretty interesting capability for it to have.


Translating things between languages is probably one of the least interesting capabilities of LLMs - it's the one thing that they're pretty much meant to do well by design.


Surely you agree that directly copying existing code into a different language is still plagiarism?

I completely agree that "reweite this existing codebase into a new language" could be a very powerful tool. But the article is making much bolder claims. And the result was more limited in capability, so you can't even really claim they've achieved the rewrite skill yet.



There seem to still be a lot of people who look at results like this and evaluate them purely based on the current state. I don't know how you can look at this and not realize that it represents a huge improvement over just a few months ago, there have been continuous improvements for many years now, and there is no reason to believe progress is stopping here. If you project out just one year, even assuming progress stops after that, the implications are staggering.


The improvements in tool use and agentic loops have been fast and furious lately, delivering great results. The model growth itself is feeling more "slow and linear" lately, but what you can do with models as part of an overall system has been increasing in growth rate and that has been delivering a lot of value. It matters less if the model natively can keep infinite context or figure things out on its own in one shot so long as it can orchestrate external tools to achieve that over time.


The main issue with improvements in the last year is that a lot of it is based not on the models strictly becoming better, but on tooling being better, and simply using a fuckton more tokens for the same task.

Remember that all these companies can only exist because of massive (over)investments in the hope of insane returns and AGI promises. While all these improvements (imho) prove the exact opposite: AGI is absolutely not coming, and the investments aren't going to generate these outsized returns. The will generate decent returns, and the tools are useful.


I disagree. A year ago the models would not come close to doing this, no matter what tools you gave them or how many tokens you generated. Even three months ago. Effectively using tools to complete long tasks required huge improvements in the models themselves. These improvements were driven not by pretraining like before, but by RL with verifiable rewards. This can continue to scale with training compute for the foreseeable future, eliminating the "data wall" we were supposed to be running into.


Every S-curve looks like an exponential until you hit the bend.


We've been hearing this for 3 years now. And especially 25 was full of "they've hit a wall, no more data, running out of data, plateau this, saturated that". And yet, here we are. Models keep on getting better, at more broad tasks, and more useful by the month.


Model improvement is very much slowing down, if we actually use fair metrics. Most improvements in the last year or so comes down to external improvements, like better tooling, or the highly sophisticated practice of throwing way more tokens at the same problem (reasoning and agents).

Don't get me wrong, LLMs are useful. They just aren't the kind of useful that Sam et al. sold investors. No AGI, no full human worker replacement, no massive reduction in cost for SOTA.


Yes, and Moore's law took decades to start to fail to be true. Three years of history isn't even close to enough to predict whether or not we'll see exponential improvement, or an unsurmountable plateau. We could hit it in 6 months or 10 years, who knows.

And at least with Moore's law, we had some understanding of the physical realities as transistors would get smaller and smaller, and reasonably predict when we'd start to hit limitations. With LLMs, we just have no idea. And that could be go either way.


> We've been hearing this for 3 years now

Not from me you haven't!

> "they've hit a wall, no more data, running out of data, plateau this, saturated that"

Everyone thought Moore's Law was infallible too, right until they hit that bend. What hubris to think these AI models are different!

But you've probably been hearing that for 3 years too (though not from me).

> Models keep on getting better, at more broad tasks, and more useful by the month.

If you say so, I'll take your word for it.


Except for Moore's law, everyone knew decades ahead of what the limits of Dennard scaling are (shrinking geometry through smaller optical feature sizes), and roughly when we would get to the limit.

Since then, all improvements came at a tradeoff, and there was a definite flattening of progress.


> Since then, all improvements came at a tradeoff, and there was a definite flattening of progress.

Idk, that sounds remarkably similar to these AI models to me.


Everyone?

Intel, at the time the unquestioned world leader in semiconductor fabrication was so unable to accurately predict the end of Dennard scaling that they rolled out the Pentium 4. "10Ghz by 2010!" was something they predicted publicly in earnest!

It, uhhh, didn't quite work out that way.


25 is 2025.


Oh my bad, the way it was worded made me read it as the name of somebody's model or something.


> And yet, here we are.

I dunno. To me it doesn’t even look exponential any more. We are at most on the straight part of the incline.


Personally my usage has fell off a cliff the past few months. Im not a SWE.

SWE's may be seeing benefit. But in other areas? Doesnt seem to be the case. Consumers may use it as a more preferred interface for search - but this is a different discussion.


This quote would be more impactful if people haven't been repeating it since gpt-4 time.


People have also been saying we'd be seeing the results of 100x quality improvements in software with corresponding decease in cost since gpt-4 time.

So where is that?


I agree, I have been informed that people have been repeating it for three years. Sadly I'm not involved in the AI hype bubble so I wasn't aware. What an embarrassing faux pas.


What if it plateaus smarter than us? You wouldn't be able to discern where it stopped. I'm not convinced it won't be able to create its own training data to keep improving. I see no ceiling on the horizon, other than energy.


Are we talking Terminator or Matrix here? I need to know which shitty future to prepare for.


Using humans as batteries makes no sense. I expect robots will know better than that.


Cool I guess. Kind of a meaningless statement yeah? Let's hit the bend, then we'll talk. Until then repeating, 'It's an S Curve guys and what's more, we're near the bend! trust me" ad infinitum is pointless. It's not some wise revelation lol.


Maybe the best thing to say is we can only really forecast about 3 months out accurately, and the rest is wild speculation :)

History has a way of being surprisingly boring, so personally I'm not betting on the world order being transformed in five years, but I also have to take my own advice and take things a day at a time.


> Kind of a meaningless statement yeah?

If you say so. It's clear you think these marketing announcements are still "exponential improvements" for some reason, but hey, I'm not an AI hype beast so by all means keep exponentialing lol


I'm not asking you to change your belief. By all means, think we're just around the corner of a plateau, but like I said, your statement is nothing meaningful or profound. It's your guess that things are about to slow down, that's all. It's better to just say that rather than talking about S curves and bends like you have any more insight than OP.


i have to admit, even if model and tooling progress stopped dead today the world of software development has forever changed and will never go back.


Yea the software engineering profession is over, even if all improvements stop now.


It's so difficult to compare these models because they're not running the same set of evals. I think literally the only eval variant that was reported for both Opus 4.6 and GPT-5.3-Codex is Terminal-Bench 2.0, with Opus 4.6 at 65.4% and GPT-5.3-Codex at 77.3%. None of the other evals were identical, so the numbers for them are not comparable.


Isn't the best eval the one you build yourself, for your own use cases and value production?

I encourage people to try. You can even timebox it and come up with some simple things that might look initially insufficient but that discomfort is actually a sign that there's something there. Very similar to moving from not having unit/integration tests for design or regression and starting to have them.


I usually wait to see what ArtificialAnalysis says for a direct comparison.


It's better on a benchmark I've never heard of!? That is groundbreaking, I'm switching immediately!


I also wasn't that familiar with it, but the Opus 4.6 announcement leaned pretty heavily on the TerminalBench 2.0 score to quantify how much of an improvement it was for coding, so it looks pretty bad for Anthropic that OpenAI beat them on that specific benchmark so soundly.

Looking at the Opus model card I see that they also have by far the highest score for a single model on ARC-AGI-2. I wonder why they didn't advertise that.


No way! Must be a coinkydink, no way OpenAI knew ahead of time that Anthropic was gonna put a focus on that specific useless benchmark as opposed to all the other useless benchmarks!?

I'm firing 10 people now instead of 5!


People pooh-poohing space datacenters will obviously think this is a bad move. But Elon clearly believes space datacenters will work. Given that, and the fact that SpaceX will IPO this year, this acquisition was inevitable.

SpaceX and xAI would not be able to freely collaborate on space datacenters after the IPO because it would be self-dealing. SpaceX likes to be vertically integrated, so they wouldn't want to just be a contractor for OpenAI's or Anthropic's infrastructure. Merging before the IPO is the only way that SpaceX could remain vertically integrated as they build space datacenters.


How can you clearly believe anything Musk says at this point. It's not 'clear' at all what he actually believes and what he just makes up.


Space datacenters just so grok can undress women? this is the dumbest company on the planet


Meanwhile Apple has always been able to read encrypted iMessage messages and everyone decided to ignore that fact. https://james.darpinian.com/blog/apple-imessage-encryption


And it's worse if you live in the UK:

https://support.apple.com/en-us/122234

In fact on this page they still claim iMessage is end-to-end encrypted.


>has always been able to read encrypted iMessage messages

...assuming you have icloud backups enabled, which is... totally expected? What's next, complaining about bitlocker being backdoored because microsoft can read your onedrive files?


If you read the link you would know that contrary to your expectation other apps advertising E2EE such as Google's Messages app don't allow the app maker to read your messages from your backups. And turning off backups doesn't help when everyone else has them enabled. Apple doesn't respect your backup settings on other people's accounts. Again, other apps address this problem in various ways, but not iMessage.


>If you read the link you would know that contrary to your expectation other apps advertising E2EE don't allow the app maker to read your messages.

What does that even mean? Suppose icloud backups doesn't exist, but you could still take screenshots and save them to icloud drive. Is that also "Apple has always been able to read encrypted iMessage messages"? Same goes for "other people having icloud backups enabled". People can also snitch on you, or get their phones seized. I feel like people like you and the article author are just redefining the threat model of E2EE apps just so they can smugly go "well ackshually..."


It means, for example, Google Messages uses E2EE backups. Google cannot read your E2EE messages by default, period. Not from your own backup, not from other peoples' backups. No backup loophole. Most other E2EE messaging apps also do not have a backup loophole like iMessage.

It's not hard to understand why Apple uploading every message to themselves to read by default is different from somebody intentionally taking a screenshot of their own phone.


>Google cannot read your E2EE messages by default, period.

Is icloud backups opt in or opt out? If it's opt in then would your objection still hold?


I'm less concerned with whether it is technically opt in or opt out and more concerned with whether it is commonly enabled in practice.

What would resolve my objection is if Apple either made messages backups E2EE always, as Google did and as Apple does themselves for other data categories like Keychain passwords, or if they excluded E2EE conversations (e.g. from ADP people) from non-E2EE backups, as Telegram does. Anything short of that does not qualify as E2EE regardless of the defaults, and marketing it as E2EE is false advertising.


Absolutly, they intentionally make stuff sound secure and private while keeping full access.


I remember reading this recently. Not saying it’s true but it got my attention

TUESDAY, NOVEMBER 25, 2025 Blind Item #7 The celebrity CEO says his new chat system is so secure that even he can't read the messages. He is lying. He reads them all the time.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: