Hacker Newsnew | past | comments | ask | show | jobs | submit | svantana's commentslogin

That's probably because "Gemini 3.5 Pro" doesn't exist

The silly verbiage can be excused but not the graphs with completely unlabeled data points, IMO.


Yep that's what I mean - looks like AI slop to me.


That's mentioned in the article, but is the lock-in really that big? In some cases, it's as easy as changing the backend of your high-level ML library.


That is like how every ORM promises you can just swap out the storage layer.

In practice it doesnt quite work out that way.


That's what it is on paper. But in practice you trade one set of hardware idiosyncrasies for another and unless you have the right people to deal with that, it's a hassle.


On top, when you get locked into Google Cloud, you’re effectively at the mercy of their engineers to optimize and troubleshoot. Do you think Google will help their potential competitors before they help themselves? Highly unlikely considering their actions in the past decade plus.


Given my Fitbit's inability to play nice with my pixel phone, I have zero faith in Google engineers.

What else would one expect when their core value is hiring generalists over specialists* and their lousy retention record?

*Pay no attention to the specialists they acquihire and pay top dollar... And even they don't stick around.


I thin k you can only run on google cloud not aws bare metal azure etc


According to that site, there were more tech layoffs in 2022 than in 2024 or 2025. Doesn't that speak against the "AI is taking tech jobs" hypothesis?


Massive, embarrassingly shortsighted overhiring in 2020 and 2021 seems like the more likely culprit.


I agree, I think AI taking jobs is all smoke and mirrors by companies trying to gas up their stock prices


Doesn't work in this case because the 'talk' (github PR comments) is also computer generated. But in person (i.e. at work) it's a good strategy


SWEBench-Verified is probably benchmaxxed at this stage. Claude isn't even the top performer, that honor goes to Doubao [1].

Also, the confidence interval for a such a small dataset is about 3 percent points, so these differences could just be up to chance.

[1] https://www.swebench.com/


claude 4.5 gets 82% on their own highly customized scaffolding. (parallel compute with a scoring function). That beats Doubao


Grok got to hold the top spot of LMArena-text for all of ~24 hours, good for them [1]. With stylecontrol enabled, that is. Without stylecontrol, gemini held the fort.

[1] https://lmarena.ai/leaderboard/text


Is it just me or is that link broken because of the cloudflare outage?

Edit: nvm it looks to be up for me again


Grok is heavily censored though


Is it censored... or just biased towards edge-lord MechaHitler nonsense whenever Musk feels like tinkering with the system prompt?


One percentage point is not significant, neither in the colloquial nor the scientific sense[1].

[1] Binomial formula gives a confidence interval of 3.7%, using p=0.77, N=500, confidence=95%


It's analogous to how politicians nowadays are constantly saying "let me be clear", it drives me nuts.


Another annoyance: "In my honest opinion...". Does that mean that you other times are sharing dishonest opinions? Why would you need to declare that this time you're honest?


This has been a pet peeve of mine for years. I call people out when they say this for the abuse of language and for being a time vampire.


Indeed, the galaxy brain move here would be to make a game about game design, that itself follows its own principles.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: