I found it funny because the opposite direction, people accused Tesla of naming “autopilot” misleadingly, because it gave them the impression of fully unattended self-driving.
In aviation, autopilot features were until recently (and still for GA pilots) essentially just cruise control: maintain this speed and heading, maintain this climb rate and heading, maintain this bank angle, etc.
I just did a quick once-over on the PR and am pretty shocked by how "simple" it is. For switching the background graphics library, I would have expected this to be some pretty delicate surgery. But the "meat" is swapping out the old abstraction layer for the new one (500–600LOC) then `s/blade/wgpu/g`. There's a little more to it than that, but not much.
I'm already a Zed user, but to me that's an extremely good indicator of the engineering quality of the project.
I’m not really sure the point you’re trying to make behind “as long as you don’t mind dying early and painfully from easily preventable diseases technically you can live in utopia”. Would you mind clarifying your position here?
When chess engines were first developed, they were strictly worse than the best humans. After many years of development, they became helpful to even the best humans even though they were still beatable (1985–1997). Eventually they caught up and surpassed humans but the combination of human and computer was better than either alone (~1997–2007). Since then, humans have been more or less obsoleted in the game of chess.
Five years ago we were at Stage 1 with LLMs with regard to knowledge work. A few years later we hit Stage 2. We are currently somewhere between Stage 2 and Stage 3 for an extremely high percentage of knowledge work. Stage 4 will come, and I would wager it's sooner rather than later.
There's a major difference between chess and scientific research: setting the objectives is itself part of the work.
In chess, there's a clear goal: beat the game according to this set of unambiguous rules.
In science, the goals are much more diffuse, and setting those in the first place is what makes a scientist more or less successful, not so much technical ability. It's a very hierarchical field where permanent researchers direct staff (postdocs, research scientists/engineers), direct grad students. And it's at the bottom of the pyramid where the technical ability is the most relevant/rewarded.
Research is very much a social game, and I think replacing it with something run by LLMs (or other automatic process) is much more than a technical challenge.
The evolution was also interesting: first the engines were amazing tactically but pretty bad strategically so humans could guide them. With new NN based engines they were amazing strategically but they sucked tactically (first versions of Leela Chess Zero). Today they closed the gap and are amazing at both strategy and tactics and there is nothing humans can contribute anymore - all that is left is to just watch and learn.
With a chess engine, you could ask any practitioner in the 90's what it would take to achieve "Stage 4" and they could estimate it quite accurately as a function of FLOPs and memory bandwidth. It's worth keeping in mind just how little we understand about LLM capability scaling. Ask 10 different AI researchers when we will get to Stage 4 for something like programming and you'll get wild guesses or an honest "we don't know".
That is not what happened with chess engines. We didn’t just throw better hardware at it, we found new algorithms, improved the accuracy and performance of our position evaluation functions, discovered more efficient data structures, etc.
People have been downplaying LLMs since the first AI-generated buzzword garbage scientific paper made its way past peer review and into publication. And yet they keep getting better and better to the point where people are quite literally building projects with shockingly little human supervision.
Chess grandmasters are living proof that it’s possible to reach grandmaster level in chess on 20W of compute. We’ve got orders of magnitude of optimizations to discover in LLMs and/or future architectures, both software and hardware and with the amount of progress we’ve got basically every month those ten people will answer ‘we don’t know, but it won’t be too long’. Of course they may be wrong, but the trend line is clear; Moore’s law faced similar issues and they were successively overcome for half a century.
> With a chess engine, you could ask any practitioner in the 90's what it would take to achieve "Stage 4" and they could estimate it quite accurately as a function of FLOPs and memory bandwidth.
And the same practitioners said right after deep blue that go is NEVER gonna happen. Too large. The search space is just not computable. We'll never do it. And yeeeet...
We are at level 2.5 for software development, IMO. There is a clear skill gap between experienced humans and LLMs when it comes to writing maintainable, robust, concise and performant code and balancing those concerns.
The LLMs are very fast but the code they generate is low quality. Their comprehension of the code is usually good but sometimes they have a weightfart and miss some obvious detail and need to be put on the right path again. This makes them good for non-experienced humans who want to write code and for experienced humans who want to save time on easy tasks.
> The LLMs are very fast but the code they generate is low quality.
I think the latest generation of LLM with claude code is not low quality. It's better than the code that pretty much every dev on our team can do outside of very narrow edge cases.
Does it matter? Microsoft won by default with Teams because it actually turns out no one cares about chat or even has a choice in it: employees use whatever the company picks.
If you're going to say "other than the US" then you've got to say at a minimum "other than the US and China", but really "other than the US and China and Japan and Korea and Taiwan and Thailand and Russia and most of Central Asia".
Only mentioning the US is wildly americentric even by HN standards.
And 20 years ago people were making the exact same kinds of comments and everyone had the same reaction: yeah, MySQL has been putting numbers up like that for a decade.
What’s so hard about adding a feature that effectively makes a single-user device multi-user? Which needs the ability to have plausible deniability for the existence of those other users? Which means that significant amounts of otherwise usable space needs to be inaccessibly set aside for those others users on every device—to retain plausible deniability—despite an insignificant fraction of customers using such a feature?
> despite an insignificant fraction of customers using such a feature?
Isn't that the exact same argument against Lockdown mode? The point isn't that the number of users is small it's that it can significantly help that small set of users, something that Apple clearly does care about.
Lockdown mode costs ~nothing for devices that don't have it enabled. GP is pointing out that the straightforward way to implement this feature would not have that same property.
The "fake" user/profile should work like a duress pin with addition of deniability. So as soon as you log in to the second profile all the space becomes free. Just by logging in you would delete the encryption key of the other profile. The actual metadata that show what is free or not were encrypted in the locked profile. Now gone.
Sorry I explained it poorly and emphasized the wrong thing.
The way it would work is not active destruction of data just a different view of data that doesn’t include any metadata that is encrypted in second profile.
Data would get overwritten only if you actually start using the fallback profile and populating the "free" space because to that profile all the data blocks are simply unreserved and look like random data.
The profiles basically overlap on the device. If you would try to use them concurrently that would be catastrophic but that is intended because you know not to use the fallback profile, but that information is only in your head and doesn’t get left on the device to be discovered by forensic analysis.
Your main profile knows to avoid overwriting the fallback profile’s data but not the other way around.
But also the point is you can actually log in to the duress profile and use it normally and it wouldn’t look like destruction of evidence which is what current GrapheneOS’s duress pin does.
The main point is logging in to the fake profile does not do anything different from logging in to the main profile. If you image the whole thing and somehow completely bypass secure enclave (but let's assume you can't actually bruteforce the PIN because it's not feasible) then you enter the distress PIN in controlled environment and you look at what writes/reads it does and to where, even then you would not be able to tell you are in the fake profile. Nothing gets deleted eagerly, just the act of logging in is destructive to overlapping profiles. This is the only different thing in the main profile. It know which data belongs to fallback profile and will not allocate anything in those blocks. However it's possible to set up the device without fallback profile so you don't know if you are in the fallback profile or just on device without one set up.
Hopefully I explained it clearly. I haven't seen this idea anywhere else so I would be curious if someone smarter actually tried something like that already.
What you say makes sense, just like the true/veracrypt volume theory. I can't find the head post to my "that's why you image post" but what concerns me is differing profiles may have different network fingerprints. You may need to keep signal and bitlocker on both, EVERYTIME my desktop boots a cloud provider is contacted -- it's not very sanitary?
It"s a hard problem to properly set up even on the user end let alone the developer/engineer side but thank you.
Maybe one PIN could cause the device to crash. Devices crash all the time. Maybe the storage is corrupted. It might have even been damaged when it was taken.
This could even be a developer feature accidentally left enabled.
It doesn't seem fundamentally different from a PC having multiple logins that are accessed from different passwords. Hasn't this been a solved problem for decades?
You can have a multiuser system but that doesn't solve this particular issue. If they log in to what you claim to be your primary account and see browser history that shows you went to msn.com 3 months ago, they aren't going to believe it's the primary account.
My browser history is cleared every time I close it.
It's actually annoying because every site wants to "remember" the browser information, and so I end up with hundreds of browsers "logged in". Or maybe my account was hacked and that's why there's hundreds of browsers logged in.
Doesn't having standard multi-user functionality automatically create the plausible deniability? If they tried so hard to create an artificial plausible deniability that would be more suspicious than normal functionality that just gets used sometimes.
What needs to be plausibly denied is the existence of a second user account, because you're not going to be able to plausibly deny that the account belongs to you when it resides on the phone found in your pocket.
Never ever use your personal phone for work things, and vice versa. It's bad for you and bad for the company you work for in dozens of ways.
Even when I owned my own company, I had separate phones. There's just too much legal liability and chances for things to go wrong when you do that. I'm surprised any company with more than five employees would even allow it.
What's the risk? On Android, the company can remotely nuke the work profile. The work profile has its own file system and apps. You can turn it off when to don't want work notifications.
iPhone and macOS are basically the same product technically. The reason iPhone is a single user product is UX decisions and business/product philosophy, not technical reasons.
While plausible deniability may be hard to develop, it’s not some particularly arcane thing. The primary reasons against it are the political balancing act Apple has to balance (remember San Bernardino and the trouble the US government tried to create for Apple?). Secondary reasons are cost to develop vs addressable market, but they did introduce Lockdown mode so it’s not unprecedented to improve the security for those particularly sensitive to such issues.
> iPhone and macOS are basically the same product technically
This seems hard to justify. They share a lot of code yes, but many many things are different (meaningfully so, from the perspective of both app developers and users)
You think iPhones aren’t multi-user for technical reasons? You sure it’s not to sell more phones and iPads? Should we ask Tim “buy your mom an iPhone” Cook?
In aviation, autopilot features were until recently (and still for GA pilots) essentially just cruise control: maintain this speed and heading, maintain this climb rate and heading, maintain this bank angle, etc.
reply