Ultimately, anecdotes and testimonials of a product like this are irrelevant. But the public discourse hasn't caught up with it. People talk about it like it's a new game console or app, giving their positive or negative testimonials, as if this is the correct way to validate the product.
Only rigorous, continual, third party validation that the system is effective and safe would be relevant. It should be evaluated more like a medical treatment.
This gets especially relevant when it gets into an intermediate regime where it can go 10,000 miles without a catastrophic incident. At that level of reliability you can find lots of people who claim "it's driven me around for 2 years without any problem, what are you complaining about?"
10,000 mile per incident fault rate is actually catastrophic. That means the average driver has a serious, life threatening incident every year at an average driving rate. That would be a public safety crisis.
We run into the problem again in the 100,000 mile per incident range. This is still not safe. Yet, that's reliable enough where you can find many people who can potentially get lucky and live their whole life and not see the system cause a catastrophic incident. Yet, it's still 2-5x worse than the average driver.
> Only rigorous, continual, third party validation that the system is effective and safe would be relevant. It should be evaluated more like a medical treatment.
100% agreed, and I'll take it one step further - level 3 should be outright banned/illegal.
The reason is it allows blame shifting exactly as what is happening right now. Drivers mentally expected level 4 and legally the company will position the fault, in as much as it can get away with, to be on the driver, effectively level 2.
They're building on a false premise that human equivalent performance using cameras is acceptable.
That's the whole point of AI - when you can think really fast, the world is really slow. You simulate things. Even with lifetimes of data, the cars still will fail in visual scenarios where error bars on ground truth shoot through the roof.
Elon seems to believe his cars will fail in similar ways to humans because they use cameras. False premise. As Waymo scales, human just isn't good enough, except for humans.
So, I agree with what you're saying, but that doesn't matter.
The legal standing doesn't care what tech it is behind it. 1000 monkeys for all it matters. The point is level 3 is the most dangerous level because neither the public nor the manuf properly operates in this space.
It can be misleading to directly compare disengagements to actual catastrophic incidents.
The human collision numbers only count actual incidents, and even then only ones which have been reported to insurance/authorities. It doesn't include many minor incidents such as hitting a bollard, or curb rash, or bump-and-run incidents in car parks, and even vehicle-on-vehicle incidents when both parties agree to settle privately. And the number certainly excludes ALL unacceptably close near-misses. There's no good numbers for any of these, but I'd be shocked if minor incidents weren't an of magnitude more common, and near misses another order of magnitude again.
Whereas an FSD disengagement could merely represent the driver's (very reasonable) unwillingness to see if the software will avoid the incident itself. Some disengagements don't represent a safety risk at all, such as when the software is being overly cautious, e.g. at a busy crosswalk. Some disengagements for sure were to avoid a bad situation, though many of these would have been non-catastrophic (such as curbing a wheel) and not a collision which would be included in any human driver collision statistics.
As a robotaxi, yes. That's why Teslas rollout is relatively small/slow, has safety monitors, etc...
FSD, what most people use, is ADAS, even if it performs a lot of the driving tasks in many situations, and the driver needs to always be monitoring it, no exceptions.
The same applies to any ADAS. If it doesn't work for in a situation, the driver has to take over.
If there was actually a rate of one life threatening accident per 10,000 miles with FSD that would be so obvious it would be impossible to hide. So I have to conclude the cars are actually much safer than that.
FSD never drives alone. It's always supervised by another driver legally responsible to correct. More importantly we have no independently verified data about the self driving incidents. Quite the opposite, Tesla repeatedly obscured data or impeded investigations.
I've made this comparison before but student drivers under instructor supervision (with secondary controls) also rarely crash. Are they the best drivers?
I am not a plane pilot but I flew a plane many times while supervised by the pilot. Never took off, never landed, but also never crashed. Am I better than a real pilot or even in any way a competent one?
I'll grant that the marketing oversells the capabilities of the system, but (as I have commented repeatedly in these FSD threads): anyone using it knows within a couple days their comfort level. I'm utterly unconvinced that any user is actually confused about the capacity of the system just because it's named "Autopilot" or "Full Self Driving" is not telling the truth.
The fact of the technology is that while imperfect, it is absolutely a marvel, and incredibly useful. I will never drive home again after having a beer after work, or when I'm tired after a long day. I can only attribute the angry skepticism in the comments to willful ignorance or lack of in-the-seat experience. I use it everyday, it drives me driveway to parking with only occasional interventions (per week!).
I'll throw in that my wife hates it (as a passenger or driver), but she has a much lower tolerance for any variance from expected human driving behaviour (eg. lane choices, overly cautious behaviour around cars waiting to enter traffic, etc).
> I can only attribute the angry skepticism in the comments to willful ignorance or lack of in-the-seat experience
Next to "the latest version really fixed it, for realsies this time", the "anyone who doesn't like it is ignorant or has irrational hate for Tesla" must be the second most sung hymn among a small but entirely too vocal group of Tesla owners. Nothing brings down a conversation as quickly as someone like you, trying to justify your purchase by insulting everyone who doesn't agree with your sunk-cost-fallacy-driven opinions.
> Nothing brings down a conversation as quickly as someone like you, trying to justify your purchase by insulting everyone who doesn't agree with your sunk-cost-fallacy-driven opinions.
I don't have any sunk cost in FSD. The car, sure, but it's a fine electric car that I got when there weren't many options (especially at a reasonable price).
I felt I was being generous. My inclination is that animosity to Musk's odious politics clouds the rational judgement of many critics (and they've mostly have no first-hand experience with FSD for any length of time).
Above I was talking more generally about full autonomy. I agree the combined human + fsd system can be at least as safe as a human driver, perhaps more, if you have a good driver. As a frequent user of FSD, it's unreliability can be a feature, it constantly reminds me it can't be fully trusted, so I shadow drive and pay full attention. So it's like having a second pair of eyes on the road.
I worry that when it gets to 10,000 mile per incident reliability that it's going to be hard to remind myself I need to pay attention. At which point it becomes a de facto unsupervised system and its reliability falls to that of the autonomous system, rather than the reliability of human + autonomy, an enormous gap.
Of course, I could be wrong. Which is why we need some trusted third party validation of these ideas.
Yeah, I agree with that. There's a potentially dangerous attention gap that could just play into the fundamental weakness of the human brain's ability to pay attention for long periods of time with no interaction. Unfortunately I don't see any possible way to validate this without letting the tech loose. You can't get good data on this without actual driving in real road conditions.
At a certain point you do need to test in real road conditions. However, there is absolutely no need to jump straight from testing in lab conditions and “testing” using unmonitored, untrained end users.
You use professional trained operators with knowledge of the system design and operation using a designed safety plan to minimize prototype risks. At no point should your test plan increase danger to members of the public. Only when you fix problems faster than that test procedure can find do you expand scope.
If you follow the standard automotive pattern, you then expand scope to your untrained, but informed employees using monitored systems. Then untrained employees, informed employees using production systems. Then informed early release customers. Then once you stop being able to find problems regularly at all of those levels do you do a careful monitored release to the general public verifying the safety properties are maintained. Then you finally have a fully released “safe” product.
It's difficult to do because of how well matched they are to the hardware we have. They were partially designed to solve the mismatch between RNNs and GPUs, and they are way too good at it. If you come up with something truly new, it's quite likely you have to influence hardware makers to help scale your idea. That makes any new idea fundamentally coupled to hardware, and that's the lesson we should be taking from this. Work on the idea as a simultaneous synthesis of hardware and software. But, it also means that fundamental change is measured in decade scales.
I get the impulse to do something new, to be radically different and stand out, especially when everyone is obsessing over it, but we are going to be stuck with transformers for a while.
This is backwards. Algorithms that can be parallelized are inherently superior, independent of the hardware. GPUs were built to take advantage of the superiority and handle all kinds of parallel algorithms well - graphics, scientific simulation, signal processing, some financial calculations, and on and on.
There’s a reason so much engineering effort has gone into speculative execution, pipelining, multicore design etc - parallelism is universally good. Even when “computers” were human calculators, work was divided into independent chunks that could be done simultaneously. The efficiency comes from the math itself, not from the hardware it happens to run on.
RNNs are not parallelizable by nature. Each step depends on the output of the previous one. Transformers removed that sequential bottleneck.
There are large, large gaps of parallel stuff that GPUs can't do fast. Anything sparse (or even just shuffled) is one example. There are lots of architectures that are theoretically superior but aren't popular due to not being GPU friendly.
That’s not a flaw in parallelism. The mathematical reality remains that independent operations scale better than sequential ones. Even if we were stuck with current CPU designs, transformers would have won out over RNNs.
Unless you are pushing back on my comment "all kinds" - if so, I meant "all kinds" in the way someone might say "there are all kinds of animals in the forest", it just means "lots of types".
I was pushing back against "all kinds". The reason is that I've been seeing a number of inherently parallel architectures, but existing GPUs don't like some aspect of them (usually the memory access pattern).
He’s playing the game. You have to say AGI is your goal to get attention. It’s just like the YouTube thumbnail game. You can hate it, but you still have to play if you want people to pay attention.
Sometimes we get confused by the difference between technological and scientific progress. When science makes progress it unlocks new S-curves that progress at an incredible pace until you get into the diminishing returns region. People complain of slowing progress but it was always slow, you just didn’t notice that nothing new was happening during the exponential take off of the S-curve, just furious optimization.
And at the same time I have noticed that people don’t understand the difference between an S-curve and an exponential function. They can look almost identical at certain intervals.
As far back as 2017 I copped a lot of flak for suggesting that the coming automation revolution will be great at copying office workers and artists but wont be in order of replacing the whole human race. A lot of the time moores law got thrown back in my face. But thats how this works, we unlock something new, we exploit it as far as possible, the shine wears off and we deal with the aftermath.
That's putting the cart before the horse. Thermodynamics came after the steam engine was made practical. Flight came before aerodynamics. Metallurgy before materials science. Radio before electromagnetic theory took hold. Even LLMs are the result of a lot of tinkering rather than scientific insight. It’s the successful tinkering that creates the puzzle science later formalises.
The "traffic" did not incentivize "publishers" (as they call themselves) to produce any valuable content.. instead, it incentivized them to produce SEO garbage that's not helpful and often deceptive.
I think it's worth being thoughtful about the whole range of folks doing a whole range of stuff out on the Internet. I like this piece about the various incentives that exist:
https://vbuckenham.com/blog/how-to-find-things-online/
I don't know how your comment addresses my comment and the article you mentioned is long. But since you replied, you know what you meant, so could you share your particular argument?
I think connecting this to what V expressed is the value I meant to add here, I think they expressed what they meant with the detail required to express it, and I'm not interested in engaging with the topic on a level that works from summaries alone. Suffice to say that you responded to a comment speaking of how "people are properly incentivized to produce content" by saying '"traffic" did not incentivize "publishers" (as they call themselves) to produce any valuable content' as though traffic didn't have to do with the people out there publishing valuable stuff on the internet today, or as though such people don't exist. V's thing considers the interaction among these people, their incentives, and LLMs.
> as though traffic didn't have to do with the people out there publishing valuable stuff on the internet today, or as though such people don't exist
Even if there are people, today, that do publish valuable stuff on the internet only because (or thanks to) the expected traffic (either own ads or selling ad space), I claim that without that traffic, there will be others who will be publishing valuable stuff, and the traffic incentivized garbage more than the valuable stuff.
Finally, if that's not the case, then in order for the LLM to be useful, it needs valuable training data. So this problem solves itself - competitive LLM company will need to pay for the data directly, and for me, that model incentivizes valuable stuff more than garbage.
If it’s easy enough to produce creative “content” thanks to AI, and people have enough money and free time, they’ll create without being paid (for themselves, social status, social influence, scientific advancement, etc.)
In the meantime, I think people should focus on attribution, and algorithms to find related work (which may suitably substitute for the former). This will allow us to fund creators and publishers for AI output, maybe by forcing the AI companies, or naturally through patronage (see AI output you like, find who owns the training data that contributed to it, donate to them). Moreover, it will help people discover more interesting things and creators.
I hope they can figure out why these give some people headaches and eye strain (like myself) I really want to use this, but can't stand the pain for more than a few minutes.
For normal VR/AR, definitely, since you want to have objects moving in the Z direction. For this usecase it should be enough to show the "flat" virtual screen at the focal distance.
This article captures a lot of the problem. It’s often frustrating how it tries to work around really simple issues with complex workarounds that don’t work at all. I tell it the secret simple thing it’s missing and it gets it. It always makes me think, god help the vibe coders that can’t read code. I actually feel bad for them.
> I tell it the secret simple thing it’s missing and it gets it.
Anthropomorphizing LLMs is not helpful. It doesn't get anything, you just gave it new tokens, ones which are more closely correlated with the correct answer. It also generates responses similar to what a human would say in the same situation.
Note i first wrote "it also mimicks what a human would say", then I realized I am anthropomorphizing a statistical algorithm and had to correct myself. It's hard sometimes but language shapes how we think (which is ironically why LLMs are a thing at all) and using terms which better describe how it really works is important.
Given that LLMs are trained on humans, who don't respond well to being dehumanised, I expect anthropomorphising them to be better than the opposite of that.
Aside from just getting more useful responses back, I think it's just bad for your brain to treat something that acts like a person with disrespect. Becomes "it's just a chatbot", "It's just a dog", "It's just a low level customer support worker".
While I also agree with you on that, there are also prompts that make them not act like a person at all, and prompts can be write-once-use-many which lessens the impact of that.
This is why I tend to lead with the "quality of response" argument rather than the "user's own mind" argument.
I am not talking about getting it to generate useful output, treating it extra politely or threatening with fines seems to give better results sometimes so why not, I am talking about the phrase "gets it". It does not get anything.
It's a feature of language to describe things in those terms even if they aren't accurate.
>using terms which better describe how it really works is important
Sometimes, especially if you doing something where that matters, but abstracting those details away is also useful when trying to communicate clearly in other contexts.
Working as an instructor for a project course for first-year university students, I have run in to this a couple of times. The code required for the project is pretty simple, but there are a couple of subtle details that can go wrong. Had one group today with bit shifts and other "advanced" operators everywhere, but the code was not working as expected. I asked them to just `Serial.println()` so they could check what was going on, and they were stumped. LLMs are already great tools, but if you don't know basic troubleshooting/debugging you're in for a bad time when the brick wall arrives.
On the other hand, it shows how much coding is just repetition. You don't need to be a good coder to perform serviceable work, but you won't create anything new and amazing either, if you don't learn to think and reason - but that might for some purposes be fine. (Worrying for the ability of the general population however)
You could ask whether these students would have gotten anything done without generated code? Probably, it's just a momentarily easier alternative to actual understanding. They did however realise the problem and decided by themselves to write their own code in a simpler, more repetitive and "stupid" style, but one that they could reason about. So hopefully a good lesson and all well in the end!
Sounds like you found a good problem for the students. Having the experience of failing to get the right answer out of the tool and then succeeding on your whits creates an opportunity to learn these tools benefit from disciplined usage.
There's a pretty big gap between "make it work" and "make it good".
I've found with LLMs I can usually convince them to get me at least something that mostly works, but each step compounds with excessive amounts of extra code, extraneous comments ("This loop goes through each..."), and redundant functions.
In the short term it feels good to achieve something 'quickly', but there's a lot of debt associated with running a random number generator on your codebase.
In my opinion, the difference between good code and code that simply works (sometimes barely); is that good code will still work (or error out gracefully) when the state and the inputs are not as expected.
Good programs are written by people who anticipate what might go wrong. If the document says 'don't do X'; they know a tester is likely to try X because a user will eventually do it.
Only rigorous, continual, third party validation that the system is effective and safe would be relevant. It should be evaluated more like a medical treatment.
This gets especially relevant when it gets into an intermediate regime where it can go 10,000 miles without a catastrophic incident. At that level of reliability you can find lots of people who claim "it's driven me around for 2 years without any problem, what are you complaining about?"
10,000 mile per incident fault rate is actually catastrophic. That means the average driver has a serious, life threatening incident every year at an average driving rate. That would be a public safety crisis.
We run into the problem again in the 100,000 mile per incident range. This is still not safe. Yet, that's reliable enough where you can find many people who can potentially get lucky and live their whole life and not see the system cause a catastrophic incident. Yet, it's still 2-5x worse than the average driver.