Rivian software update bricks infotainment system, fix not obvious

latchkey · on Nov 14, 2023

I built a whole remote software update mechanism for a control binary that ran on 25k+ servers across multiple data centers.

Rest assured that after the first time I messed it up (which required ssh into each box individually), I wrote a lot of unit and integration tests to make sure that it never failed to deploy again. One of the integration tests ensured that the app started up and could always go through the internal auto update process. This ran in CI and would fail the build if it didn't pass.

While I fully understand that this is hard to get right 100% of the time, a mess up of this level by a car manufacturer is pretty amazing to me.

foobiekr · on Nov 15, 2023

Rivian is an embedded use case, though, which is not at all like a fleet of servers.

Having worked for companies that produce network devices - including devices that are unreachable for example for 6 months of the year - and on software installation and upgrade, I am baffled how this bricking is possible. For one thing, you generally use some kind of confirmed boot mechanism - you upgrade a standby partition, set an ephemeral boot value that causes device to boot the alternate image, and reboot - only when the image is declared "up" does that get persisted (and then the alternate is upgraded, in order to prevent rollback in the event of a media error). You use watchdogs that are tied to actual forward progress (and not just some demon that the kernel schedules and bangs on the watchdog even if the rest of the system is hung) and if they fail, the WD reboots you. (This is one of the reasons that event driven programming is somewhat preferred - actually processing events from a single dispatch thread makes it easier to reason about the system.)

On top of that, you make sure that the core system is an immutable filesystem so that you can validate the _offline_ alternate image before rebooting (write-and-read-back-uncached) and periodically scrub the alternate image (same).

Like.. this is all embedded 101, stuff people have been widely doing since the mid 1990s and I think I can find examples going back to the 70s. Sometimes you get a little more sophisticated (allow sub-packages or overlays and use a manifest to check the ensemble instead of just a single image), but it's very standard.

dcow · on Nov 15, 2023

Assuming Rivian does know embedded 101, my guess is that the infotainment system is running Android and the watchdog reported all green once the system services all came online and that it doesn't actually check whether the application layer is really working because, as you know, that would require the watchdog to run a full regression suite before giving the okay, which isn’t practical. Since the update swapped the system to an internal dev cert, they cant push an immediate update to change the boot args because the management plane daemon won’t connect to the C&C server, or it can but the blob they push wouldn’t pass signature validation, or the TEE won’t unlock the device keys because the roots changed. Whatever the case, someone has to go blow a fuse and re-flash the thing, or at least rewrite the boot args via serial. Just a guess.

If it is the most likely “management plane TLS certs” issue, I bet the watchdog won’t confirm the new boot args until the command dispatch daemon gets a pong from the C&C server moving forward (:

ikiris · on Nov 15, 2023

That sounds out of scope for the MVP. We can worry about redundancies later after we ship.

roland35 · on Nov 15, 2023

Hey now, preventing SEVs doesn't lead to impact. If we all collectively let this become a raging dumpster fire we can all heroically fix it and greatly exceed expectations for the half.

bluGill · on Nov 15, 2023

You too have noticed most great employee rewards go to someone who if they had done their job well wouldn't have been notived

ikiris · on Nov 15, 2023

This guy FANGs

KingMachiavelli · on Nov 15, 2023

Did you just use standard Yocto or similar tools to build such images? Are there standard daemons for managing hardware watchdogs (besides systemd since that's too simple as you say)? I think there's a lot of niche knowledge in the embedded space and many programmers are used to cloud systems and at most target. The most embedded experience most programmers have is likely iOS/Android development where all of the actual embedded concerns are handled for you. Even Google (soft)bricked a bunch of phones with the latest Android 14 update [1].

IMO there's not a lot of regular OSS for building embedded systems that comes with A/B partitioning, watchdogs, secure and verified boot - it's all custom at every org and tailored for individual products.

[1] https://arstechnica.com/gadgets/2023/11/android-14-patches-r...

MarkSweep · on Nov 15, 2023

I quit my job before I got to deploy this, but RAUC looked like it would handle this for Yocto:

https://github.com/rauc/rauc https://github.com/rauc/meta-rauc

For microcontrollers, Memfault had a good article:

https://interrupt.memfault.com/blog/device-firmware-update-c...

neuralRiot · on Nov 15, 2023

> including devices that are unreachable for example for 6 months of the year

That made me think, imagine NASA bricking up the voyager with a SW update.

aaronbeekay · on Nov 14, 2023

As somebody currently working at an automaker on software systems, the amazing thing to me is that a mess up of this level doesn’t happen weekly. It’s rough out here.

jacquesm · on Nov 15, 2023

Thank you. At least you're honest about it, the other day someone was trying real hard to convince me that software developers at automakers are made of magic fairy dust.

kalleboo · on Nov 15, 2023

I'm amazed anyone would argue that after the Toyota firmware analysis.

jacquesm · on Nov 15, 2023

Check out the thread a couple of days ago:

https://news.ycombinator.com/item?id=38244149

bozhark · on Nov 14, 2023

What's the priority then, telemetry data? Why is it rough out there?

jacquesm · on Nov 15, 2023

Relatively crappy pay, complex toolchains, long build times, layer upon layer of (really bad) legacy code, badly specified (if they're specified) protocols between subsystems, subsystems that are completely opaque (no source code provided), homegrown OS's or older RTOS's, subset-of-C to keep it safe(r), tricky debugging environments and if you're really unlucky anemic hardware.

I hope I didn't miss anything but I wouldn't be surprised if I did.

ahartmetz · on Nov 15, 2023

Yeah, I think you missed something. The "software architects, heavy enterprisey tooling, and minions" approach to development where some of the architects could be good developers, but they don't develop, and the minions are often not that good and also not given any autonomy, so they are in a state of learned helplessness and just do what they're told without much thinking or initiative. It results in over-abstracted, over-complicated, slow, unreliable, and sometimes just stupid code.

jacquesm · on Nov 15, 2023

Fair enough, yes. That's hopefully not all of them though but I don't doubt that many of the older companies work like that.

ahartmetz · on Nov 15, 2023

Most car companies are, in fact, quite old. Their big suppliers (who are often even worse at software, if you can believe that) are also quite old.

reactordev · on Nov 14, 2023

Probably due to fires, failures, and fatigue.

firtoz · on Nov 15, 2023

Games have AAA, autos have FFF

foobiekr · on Nov 14, 2023

do you guys not have confirmed boot and swizzling to fallback images?

AlotOfReading · on Nov 15, 2023

Automotive varies widely between "basically modern Linux systems with proper updates" and the most janky, home-grown update systems imaginable, sometimes even within the same components and teams.

foobiekr · on Nov 15, 2023

Yah, I know from friends at ford and vw that there's still vxworks and qnx, but even there, good grief, a-b with confirmed boot is about as basic as you can get.

I confess I've seen incredible sloppiness about when a confirmation is done (too early, including in the initial init stages which is way too soon) and watchdogs (spawn off a process that has a while loop stroking the wd - just absolutely pointless).

cozzyd · on Nov 15, 2023

I've seen kicking and petting the watchdog, but this is my first time seeing stroking

LoganDark · on Nov 15, 2023

Sometimes the watchdog needs to have fun too, you know.

WWLink · on Nov 15, 2023

I've heard all of the above, often "stroking". I never used those because I like systems where you have a random challenge code to respond to. Then the software has to not be acting as wonky to react correctly.

ahartmetz · on Nov 15, 2023

From experience, QNX is actually very nice. I wouldn't say "still using QNX" like it's some crap that nobody would want.

mips_r4300i · on Nov 15, 2023

Indeed, a good RTOS from 10-20 years ago works just as good now as it did back then. The only things that change are dev environments and the driver support.

cjbprime · on Nov 15, 2023

> This ran in CI and would fail the build if it didn't pass.

I don't mean to be pedantic, but since we're talking about what should happen instead, this is insufficient. It works until the day you realize you made some kind of manual change to your CI infra, or that CI has some non-standard configuration that makes it work for you but not some significant fraction of the fleet.

People should do what you described in CI, but as well as that, you need phased rollout, where e.g. the build can only be rolled out to the next percentage point of randomly selected users in a specific segment (e.g. each hardware revision and country as independent segments) after meeting a ratio of successful check-ins, in the field, from the new build by production customers in that segment. That's the actual metric for proceeding with the rollout: actual customers are successfully checking in from the new version of the software.

Except, that's actually not sufficient either. What if the new build is good, but it contains an update to the updater which bricks the updater? Now you're getting successful check-ins from the new version in the field, but none of those customers will ever successfully auto-update again. So, test the new updater's ability to go forwards successfully, too.

quailfarmer · on Nov 15, 2023

A good way to handle the who-updates-the-updater issue is to use a triple partition updater. A updates B, and then B updates C, then C updates A. If anything about the new version prevents it from properly updating its neighbor, that neighbor won't be able to close the loop, and you'll fall back to A. This simplifies the FSBL, because it just boots the three partitions in a loop, no failure detection required. You don't need to triplicate the full application either, just the minimum system needed to perform an update, and then have the "application" in it's own partition to be called by the updater.

latchkey · on Nov 15, 2023

> It works until the day you realize you made some kind of manual change to your CI infra, or that CI has some non-standard configuration that makes it work for you but not some significant fraction of the fleet.

Nah, my CI process was solid. This was proven in the field over the course of years.

> I don't mean to be pedantic... you need phased rollout

You don't need to be pedantic, but better to ask the question rather than assume that was all that I did. =) You have to realize that what I built, worked flawlessly. It wasn't easy either, took a lot of trial and error.

I did have a CIDR based rollout. I could specify down to the individual box that it would run a specific version. Or I could write "latest" to always keep certain boxes running on the latest build. This was another part of my testing, but ended up not being fully necessary because I had enough automated testing in CI that "latest" always worked.

> but it contains an update to the updater which bricks the updater?

This happened, so I wrote a lot of test code to make sure that would never happen again. My CI would catch that since I was E2E testing that it could actually run the upgrade process.

Once I implemented all of this, I never had a single failure and would routinely, several times a day, deploy to the entire cluster, over the course of a couple years.

It was all eventually consistent as I could also control the "check for update" frequency as well.

cjbprime · on Nov 15, 2023

I think there's a minor confusion here, where you think the purpose of my response involves doubting whether your system was successful. I understand it was successful. My response is to the sense in which your comment can be interpreted as advice to other people on what they should build.

I think the fact that you were able to survive with CI-only doesn't mean that we should encourage others to skip implementing a phased rollout based on verified customer successes, including testing of their new updaters before the first time they accidentally brick all the updaters, rather than afterwards. That's what I was hoping to help avoid, through my comment.

jacquesm · on Nov 15, 2023

And you need to verify the vehicle is not in motion.

psychlops · on Nov 14, 2023

Having worked on 25K machines, I can assure you that it never deployed to every single machine and failed to do so in interesting ways all the time.

latchkey · on Nov 15, 2023

It always deployed. It was eventually consistent. Any failure would automatically be resolved after a period of time.

psychlops · on Nov 15, 2023

Interesting. At any point in time, I had errors from hardware, software and networking. Even the racks would be getting overwhelmed at certain times. Simply being able to ssh into every host wasn't guaranteed. I'm not sure how you did it.

kuchenbecker · on Nov 15, 2023

+1 to this, we have a 0.1% hardware failure rate every time we do a rolling restart (40-50k nodes). Some just never come back, in the best case, but actively misbehave in the worst. If the node is unresponsive we remove it from the cluster and fix it async.

latchkey · on Nov 16, 2023

If the daemon was running, it would ping a central server on a schedule and report its status, the response from the server was if there was a new version available (with the binary in the response), or not. This combined ping/update service really cut down on the overall traffic, and failures.

If the machine had crashed, it would start up, start my daemon, and that daemon would start the ping/update process all over again.

A large portion of the machines were iPXE booted... so, just reboot was one option and it would all start from scratch again.

Yes, some of the boxes had flaky power supplies or would fail an ssd, and that would cause a technician to go out and manually fix things.

I found it was critical to think of everything as eventually consistent because my hardware was boxes with 12 GPUs and they were flaky and would crash the whole box randomly. I got used to boxes rebooting hundreds of times. My process would also auto-tune the GPU for stability too, changing clock/power settings until the individual cards would become stable and stop the crashing.

The only time I had problems was when the daemon was dead. I had a dashboard where I could see which machines hadn't reported their status. It was easy to pick those off by hand.

postalrat · on Nov 15, 2023

As a frontend web developer I'm constantly deploying software to many thousands of machines. And you know what? It's pretty damn simple.

drdaeman · on Nov 15, 2023

I used to wear your shoes in IE6/7 ages (no longer, I gave up during the "framework of the week" race and went all-backend), and it wasn't simple at all. Browser compatibility with all their rendering nuances, individual system oddities and all sort of fragile stuff.

And fortunately, no one bats an eye at a slightly broken site, but everyone hates even a slightly broken vehicle.

jrumbut · on Nov 15, 2023

It's simple because we tolerate certain limitations in the web platform.

If you had a hard requirement that a page load could never take more than 100ms, regardless of network conditions, you'd have quite a challenge on your hands.

onion2k · on Nov 15, 2023

The laws of physics are definitely very challenging. If you've got a solution please write a blog post.

jrumbut · on Nov 15, 2023

No blog post required! You just install the whole app on a dedicated piece of hardware on site.

But then deployment becomes more challenging ;)

postalrat · on Nov 15, 2023

Those deployments are called PWAs.

onion2k · on Nov 15, 2023

I'm not really a frontend dev any more but I was for a long time. I can assure you that the only reason you think your code works is because no one tells you it's broken. If you use an error logging or telemetry service (Sentry, Rollbar, New Relic, etc) you will be aware that errors happen in frontend code all the time. It's just that most of the time bugs don't crash the app, and the user doesn't know what to expect so they see a broken feature and think it's meant to be like that.

uw_rob · on Nov 15, 2023

I don't think it's fair to consider the updaters for either Chrome or the OS to be simple.

donmcronald · on Nov 14, 2023

> While I fully understand that this is hard to get right 100% of the time, a mess up of this level by a car manufacturer is pretty amazing to me.

I feel like it's going to happen to someone that makes network devices eventually. I'm always scared to update my (several hundred) UniFi devices. Their update process isn't foolproof and they push auto-updates via the UI pretty hard.

Several years ago they caused some people's devices to disconnect from the management controller when they enabled 'https' communication. Prior to that, if you were pointing devices at 'https://example.com:8080...' they would ignore the 'https' part and do an 'http' request to port '8080'. Then they pushed their 'https' update which expected an 'https' connection and didn't fall back to the old behavior for anyone that was mistakenly using 'https' in their URL initially. Some people on their forums complained about having to manually SSH to every device to fix the issue.

It was caused by an end-user mistake, but they knew it was a potential issue. AFAIK, their attitude on it hasn't changed and a lot and at the time their response was that they knew it would break some people, but that it wouldn't be that many (lol).

IMO, the issue with those systems is that basic communication back to the update / config server is part of the total package which is too complex (ie: a full Debian install). I'd rather see something like Mender (mender.io) where the core communications / updates come from a hardened system with watchdog, recovery, rollback logic.

Think of how crazy it is to have something like pfSense doing package based updates rather than slice based updates. At least with boot environments they could add some watchdog and rollback type logic, but it'll still be part of the total system instead of something like a hardened slice based setup where the most critical logic is isolated from everything else and treated like a princess.

Do you have any insight on package vs slice based systems for updates? Did you isolate update logic from the rest of the system or am I out of touch with that opinion?

vGPU · on Nov 15, 2023

Reminds me of my (far less critical) update process for home assistant. Every time something breaks. Currently my hvac automations are going haywire.

akira2501 · on Nov 15, 2023

When possible, I used a fail back mechanism. If the update failed to fully come up, then the watchdog timer would catch it, the bootloader would notice the incomplete boot, and attempt to boot from the previous known working image in that case.

code_runner · on Nov 14, 2023

out of morbid curiosity.... how long did it take to ssh into and fix all of those servers? I imagine even automating a fix (if possible) would still take a good amount of time.

latchkey · on Nov 14, 2023

gnu parallel and sshpass is your friend.

The way I built my app was that I could install it cleanly via a curl | bash.

So, I just had a simple shell script that iterated through the list of IP addresses (from the DHCP leases), ran curl | bash and that cleaned up the mess pretty quickly.

jdechko · on Nov 15, 2023

As a non-developer, the whole situation with a bad software update to the Voyager spacecraft really puts things into perspective as far as how bad remote updates can be.

It’s also a testament to the way that the system was designed that they were able to get it back online.

sixtram · on Nov 15, 2023

you ssh-d into 25K servers one by one? I mean, manually?

latchkey · on Nov 15, 2023

https://news.ycombinator.com/item?id=38270986

ugh123 · on Nov 15, 2023

Please tell me you scripted that ssh into across your 25k servers!

latchkey · on Nov 15, 2023

https://news.ycombinator.com/item?id=38270986

One thing my little control process did on the box was to always set the password to be the same... user/1.

None of these boxes needed inbound connections, so it wasn't a big deal to do that.

gravitronic · on Nov 14, 2023

I used to work for a company that built satellite receivers that would be installed in all sorts of weird remote environments in order to pull radio or tv from satellite and rebroadcast locally.

If we pushed a broken update it might mean someone from the radio company would have to make a trip to go pull the device and send it to us physically.

Our upgrader did not run as root, but one time we had to move a file as root.. so I had to figure out a way to exploit our machine reliably from a local user, gain root, and move the file out of the way. We'd then deploy this over the satellite head end and N remote units would receive and run the upgrade autonomously. Fun stuff.

Turns out we had a separate process running that listened on a local socket and would run any command it received as root. Nobody remembered building or releasing it but it made my work quick.

singleshot_ · on Nov 14, 2023

The person who built and released this might not have ever worked for your company, which might be why no one remembers building or releasing it.

gravitronic · on Nov 15, 2023

No no, I figured that out afterwards, in a past development iteration someone added it on purpose and then forgot all about it - "oh yeah we needed that to <solve some mundane problem>".

So... worse than subterfuge? That being said it only listened on the local socket, so it's slightly less bad, and I don't want to get into the myriad of correct ways that original problem could have been solved, but lets just say that company doesn't exist anymore.

cjbprime · on Nov 15, 2023

I admire your restraint in writing this comment. :)

ThePowerOfFuet · on Nov 15, 2023

This is one of the very finest comments I have ever seen on HN (or anywhere else, for that matter).

nomel · on Nov 14, 2023

> Turns out we had a separate process running that listened on a local socket and would run any command it received as root. Nobody remembered building or releasing it but it made my work quick.

No offense, but what a shit show. It makes me assume no source control, and a really good chance that state actors made their way into your network/product. This almost happened at a communication startup I know, with three letter agencies helping resolve it. State actors really like infiltrating communication stuffs.

gravitronic · on Nov 14, 2023

oh, yeah, this place was a total shit show. BUT we were ISO9001 certified!! So we had source control (CVS) and a Process (with a capital P) to follow. In this case that code was added in a previous development iteration because someone needed to run something as root when a user pressed a certain button on the LCD panel in front and this was the decoupled solution they wrote intentionally. Somehow I feel like that makes it worse than if it was a malicious three letter agency lol.

nomel · on Nov 14, 2023

Package this up and send it to https://thedailywtf.com

It's beautiful.

qmarchi · on Nov 14, 2023

It's crazy to me that this is possible in the first place. Standard practice is to have a fleet of test vehicles that are effectively production except in an early release group.

Or, you know, having an A/B boot partition scheme with a watchdog. Things that have been around for decades at this point.

Disclaimer: Former Googler, Worked closely with Automotive.

michaelt · on Nov 14, 2023

To me it's all-too-understandable how this is possible.

Maybe they've got a test fleet, but it accepts code signed with the test build key.

Maybe they've got a watchdog timer, but it doesn't get configured until later in the boot process.

Maybe they've got A/B boot partitions, but trouble counting their boot attempts - maybe they don't have any writable storage that early in the boot process.

I wouldn't be surprised if, as a newer company, they'd made a 'Minimum Viable Product' secure boot setup & release procedure, and the auto-fallback and fat-finger-protection were waiting to get to the top of the backlog.

qmarchi · on Nov 14, 2023

So, using Polestar as a reference as it's both a vehicle that I've worked on, and one that I personally drive.

> Maybe they've got a test fleet, but it accepts code signed with the test build key.

Polestar solves this by only delivering signed updates to their vehicles. The vehicle headunit will refuse to flash a partition that isn't signed by the private key held by Polestar. Pulls double duty to prevent someone from flashing a malicious update, as well as corruption detection.

> Maybe they've got a watchdog timer, but it doesn't get configured until later in the boot process.

Based on what the Rivian reports are showing (Speedometer, cameras, safety systems are working), they likely are running their infotainment as a "virtual machine" within their systems. Again, something that Polestar does.

Implementation of a watchdog with a "sub-system" like this is relatively braindead simple.

> Maybe they've got A/B boot partitions, but trouble counting their boot attempts - maybe they don't have any writable storage that early in the boot process.

Generally, A/B partitioning is part of the bootloader, the first program that executes after the reset (on many modern processors) pin is released. This also leads to reboot counters and such being stored as part of the NVRAM that is available at boot.

Opinion: Maybe I'm biased, but maybe if you can't develop something yourself, there's reason for you to get an off the shelf option that handles a lot of these things.

Disclaimer: Former Googler, Worked closely with Automotive.

Gud · on Nov 14, 2023

To be honest, I don't think Polestar set a very high bar for software quality. I am currently renting a Polestar 2 from Hertz, and sometimes the HUD doesn't work(it's 50/50 if it will turn on). That means, I don't see speed, battery charge, etc, while driving. Infotainment system is working though.

qmarchi · on Nov 14, 2023

You're definitely correct in the fact that Polestar isn't the highest in software quality, but it was the examples of what they did right that I wanted to focus on.

They're garbage when it comes to their mobile app and some of the controls on the infotainment system.

> sometimes the HUD doesn't work(it's 50/50 if it will turn on).

You should definitely reach out to Hertz and ask for a car swap. Sounds like there's a bad connection between the display and the IHU. Both screens are operated by the same system, so it's unlikely to be a software failure.

BlueTemplar · on Nov 14, 2023

Sounds ridiculous. How is that even road legal ?!?

FireBeyond · on Nov 14, 2023

Teslas occasionally need to reboot / hard reset their software too, when driving no less, and during that period all that information, and most of the controls, are unavailable (like windshield wipers, etc.)

Terr_ · on Nov 14, 2023

> Teslas occasionally need to reboot / hard reset their software too, when driving no less

Move over, Microsoft, I think I've found an update policy I hate even more...

yreg · on Nov 15, 2023

It's not related to update policy, it happens when the infotainment crashes.

fransje26 · on Nov 15, 2023

You never had Windows force-reboot your system during an unrequested background update? Happened twice to me.

Terr_ · on Nov 16, 2023

I've lost work to it because it killed my running VirtualBox sessions.

fransje26 · on Nov 17, 2023

I feel you. I've had my machine blocked for 5 hours as it was pushing an update down my throat. Luckily it gave me a 2 minute warning before the initial reboot so that I could close everything down.

As that was during a client emergency, I had a very happy client that day.

yreg · on Nov 15, 2023

>and most of the controls, are unavailable

All the controls related to driving are available. You can use turn signals, wipers, change lights, honk, shift gears, etc.

But you are correct that you don't see your speed during the reboot.

ClassyJacket · on Nov 15, 2023

That's not true. You cannot shift gears in the Model X, S, or 3 without the touchscreen - the only way to change into or out of park, drive, or reverse is to swipe on the touchscreen. Only the Model Y has a stalk and it is being removed in the next version.

Also even on older Teslas lights are controlled exclusively via touchscreen.

mrsirduke · on Nov 15, 2023

There are captive buttons to change gears/direction in the X (2023) where the hazards are, as outlined in the docs: https://www.tesla.com/ownersmanual/modelx/en_us/GUID-E9B387D...

yreg · on Nov 15, 2023

I don't know what to tell you, you are wrong. I drive a Tesla, I have specifically checked all of what I mentioned when I had this (in my opinion highly problematic) issue happen to me. Have you actually tried it?

Even the new stalkless cars have gear selectors.

brewdad · on Nov 15, 2023

As someone who lives in a place where it rains almost every day for 9 months of the year, this reinforces my decision to never buy a Tesla. Does this kill the headlights too?

xenadu02 · on Nov 15, 2023

FWIW in two years of ownership we have never seen the touchscreen reboot while driving.

yreg · on Nov 15, 2023

I had it happen two times this year, but the original comment is wrong, you can use the wipers (and also the headlights) while the touch screen is rebooting.

serf · on Nov 15, 2023

some ex-Boeing engineer probably came up with that 'fix'.[0]

[0]: https://www.seattletimes.com/business/boeing-aerospace/faa-o...

iJohnDoe · on Nov 15, 2023

There are some comments from Tesla developers in previous posts about what a shitshow the code is that runs Tesla cars and what the OTA process is.

refulgentis · on Nov 14, 2023

Opinion: I'm a little confused as to how you're confused as to how test might not match prod sometimes.

Observation: "[if you write buggy software], there's reason for you to get an off the shelf option"

Question: Are you saying if they used Android Automotive this could never have happened?

Reference: similar event for Android, last week: https://linustechtips.com/topic/1538248-pixel-phones-using-m...

Disclaimer: Former Googler, did not Work closely with Automotive, Worked closely with Android.

qmarchi · on Nov 14, 2023

Answer: To clarify, more that a company should stick to their core competencies unless there's a drastic need or opening in the market that could be filled (and to build a new competency).

In this particular case, there's nothing particularly unique that Rivian is doing with their Infotainment system that couldn't already be handled by an incumbent in the space, (Android Automotive, QNX, etc.) especially given how modular the systems themselves are.

As State Farm says, "We know a thing or two because we've seen a thing or two".

hermitdev · on Nov 14, 2023

> As State Farm says, "We know a thing or two because we've seen a thing or two".

That's Farmers, delivered by J.K. Simmons, not State Farm. State Farm is Jake, in his khakis.

bo1024 · on Nov 14, 2023

Rivian thinks of itself as a software company. The first thing you sign when you go to buy a vehicle is a software copyright notice IIRC. The first thing in the owner's manual is a notice that the software copyrights and intellectual property belong to Rivian, etc etc.

jacquesm · on Nov 15, 2023

I'm not in the market but I would laugh all the way to the door if that happened.

paledot · on Nov 16, 2023

> The vehicle headunit will refuse to flash a partition that isn't signed by the private key held by Polestar. Pulls double duty to prevent someone from flashing a malicious update, as well as corruption detection.

And of course preventing people from modifying and controlling hardware that they own, having paid 6 figures for (in the case of the Polestar 3 anyway). But that's table stakes for embedded systems in this day and age. Security for me, not for thee.

LoganDark · on Nov 15, 2023

> Maybe they've got A/B boot partitions, but trouble counting their boot attempts - maybe they don't have any writable storage that early in the boot process.

You do not report a successful boot until and unless the entire system loads up successfully. You will definitely have writable storage by then.

psunavy03 · on Nov 14, 2023

Exhibit A of why a Minimum Viable Product still needs a proper Definition of Done which includes quality standards.

worik · on Nov 14, 2023

What amazes me is that any grown up person thinks it is a good idea to update vehicles as if they were telephones

Owners should have to bring the vehicle into a shop to have changes made, and they should be very rare.

This lazy, control freakery of the worst kind

Something very bad is going on happen and people will die before we realize that it is a stupid dangerous practice

qmarchi · on Nov 14, 2023

I understand the sentiment, but think about the alternatives.

There are a few different kinds of updates that can be applied, each with their own protective layers.

Infotainment updates, like what happened to Rivian aren't that dangerous. You lose "convienience features" like maps, air con, etc, but generally nothing that could kill you or someone else.

Then there's system updates, which is where danger noodle things happen. Automotive manufacturers are significantly more risk averse to updating these components, and generally, if _anything_ within the system looks wonky, it's an immediate revert.

If I, as a Polestar owner, wanted to get an update for my vehicle, the nearest service center is 1.5h away. If I lived in Montana (United States), it would be realistically impossible for me to update my car. Thus, if we want to enable competition within the markets, we shouldn't have regulations that force a new manufacturer to have a global network just to add CarPlay to a screen.

mulmen · on Nov 14, 2023

Dad has a 1966 Oldsmobile with air conditioning. In the last 57 years General Motors has never found a need to update the switch. It still works flawlessly.

It’s stupid that we invented a way to not only remotely break an on/off switch but also a culture that rolls the dice on that until the inevitable happens.

neoromantique · on Nov 14, 2023

>Infotainment updates, like what happened to Rivian aren't that dangerous. You lose "convienience features" like maps, air con, etc, but generally nothing that could kill you or someone else.

Also speedometer, which is hardly a convenience feature.

fyrn_ · on Nov 14, 2023

The dashboard panel is working, at least according to Twitter updates. "Only" affects the infotainment console.

neoromantique · on Nov 15, 2023

The reddit thread in official subreddit suggests otherwise, with people suggesting using 'speedometer apps' on their phones as a workaround.

I don't know, I have car from 1991 so I'm just observing the trainwreck.

matthews2 · on Nov 15, 2023

My car from 1997 with an analogue speedometer still has issues, because the wheel speed sensors have corroded and don't work when wet...

neoromantique · on Nov 15, 2023

Doesn't take a drive to dealer to fix that, and even then you have engine RPM and engine sound to guesstimate the speed.

brewdad · on Nov 15, 2023

Losing air con in Phoenix in July might not kill you but you'd wish you were dead.

LastMuel · on Nov 14, 2023

On the other hand, we update irreplaceable spacecraft billions of miles away with new software.

It should be fine to push software updates out, as long as the correct safety and fallback procedures are in place. It simply has to be designed to handle failure and procedures need to be in place to mitigate risks.

It sounds like that wasn't the case here. Also, why wouldn't you have a small initial release pool when you have such a large potential for disruption?

brewdad · on Nov 15, 2023

NASA's philosophy is the polar opposite of "move fast and break things".

worik · on Nov 15, 2023

If Ford or Nissan want to invest as much in ms as NASA does in Voyager....

bradleyjg · on Nov 14, 2023

The art of shipping software—like on a disk, where once it’s out the door, it’s out the door and you may never get another shot—is dead or dying. Even in some embedded areas of the industry now.

fargle · on Nov 15, 2023

> What amazes me is that any grown up person thinks it is a good idea to update vehicles as if they were telephones

What amazes me is that any grown up person thinks it is a good idea to update telephones as if they were software and not phones.

Or rather that it is a good idea to have phones that need updates? Either way, we're all one 1/2 assed push update to a fridge, vacuum, washing machine, phone or car away from a really annoying day.

worik · on Nov 15, 2023

Toucè

vore · on Nov 14, 2023

As the update only affects infotainment and not critical systems, it seems like a reasonable tradeoff to me. Just because a car can fail in ways that kill people doesn't mean all parts of a car are equally critical.

windexh8er · on Nov 14, 2023

This isn't true. If you look at the release notes for any of Rivian's updates they all include vehicle related firmware changes. This is not simply infotainment.

Beyond that "infotainment" includes driver critical information - like the speedometer which, for many affected, means there's no working driver screen.

Daneel_ · on Nov 14, 2023

The article states that the speedometer is unaffected:

"Speedometer, charging, backup cameras, locks, lights, wipers, and turn signals are all still functional with the 2023.42 error."

spaceywilly · on Nov 15, 2023

Yeah... I worked on an embedded project with literally 2 engineers, and we had an A/B partitioning scheme, and a recovery partition (we fully qualified the recovery image and it was flashed to the units on day 1, it was guaranteed to boot and it would just sit and wait for the user to initiate a firmware load). The app on the device would reset a U-boot variable once it was successfully loaded, so U-boot could check the number of failed boot attempts. If it was >= 5 reboot attempts without booting successfully, it would go into the recovery partition.

There's really no excuse from Rivian on this, this is shoddy

LargeTomato · on Nov 16, 2023

I interviewed at Rivian. They told me about how they needed to grant users access to things like keys, AC, ignition, etc. So they built a hierarchical, recursive group checking IAM system.

That just felt like a massive product to build and maintain for what really could have been backed by AWS iam. GCP IAM if they really really needed hierarchy. I guess I'm not surprised at this outage.

DannyBee · on Nov 14, 2023

Rivian does have a test fleet, and they test it for weeks before releasing. This particular issue is because they apparently distributed the firmware signed with the wrong cert.

Not a bug in the software itself.

That is independent of testing the software, but still a distribution issue.

mytailorisrich · on Nov 14, 2023

My 2c based on your comment:

* "signed with the wrong cert" should mean the software package is rejected before it it is installed.

* software upgrades are tricky and there should be at least 2 versions available so that fallback to the previous is possible and automatic in case of issues.

DannyBee · on Nov 14, 2023

The software package probably is signed right but contains multiple signed binaries of which one is signed wrong.

Or is multiple signed packages and one is wrong.

Or the test cars accept prod and test certs.

Or some combo of the above.

There are lots of ways this could have broken that doesn't amount to rivian not being able to write software

mytailorisrich · on Nov 14, 2023

I am not criticising Rivian, not least because I don't know the details.

That being said, "signed wrong", including all your hypotheses, that results in a bricked unit is definitely a serious oversight in general.

This also might highlight why production tests should be run on exactly production units. No tweaks allowed.

DannyBee · on Nov 15, 2023

It is great to live in an ideal world, and in fact, in most software, you can do what you are suggesting quite cheaply. But once you get past the sort of "quip on hacker-news" level of thinking about this, or trivial and cheap production testing scenarios, people have to make real tradeoffs because it's never that simple.

Talking about those is much more interesting than just asserting that everything should be a certain way, without any consideration for real world constraints, like cost of units, etc.

mytailorisrich · on Nov 15, 2023

I am not sure why you are being so defensive.

It would be interesting if you could develop about what you mean by real world constraints and how cost of units affect what I wrote, which I did from real world experience.

DannyBee · on Nov 17, 2023

I'm not defensive, I just find your extreme position remarkably silly.

You included nothing about how costs affect anything. You simply assert that you should always test prod on pristine production units.

There are plenty of times outside of software where production units cost millions or you can only produce them so quickly, or both, and where your extreme take would result in remarkable cost or a competitor eating your lunch.

Which is precisely why its not done, and in the real world tradeoffs are made between what really needs 100% assurance and not. Spending money or losing customers for 5 9s of reliability through testing when two are needed is not a best practice, and is often explicitly called out as such.

In the case of rivian, maintaining a significant fleet of expensive, pristine, exact customer spec (ie not debuggable) cars just to try to get 100% prod ota success assurance is unlikely to provide value vs getting 98% assurance and not doing that (by rough calculation, it stands at 98% after this incident).

mytailorisrich · on Nov 17, 2023

My position is neither silly nor extreme. It's the way it is usually done and other comments here have been along the same lines.

In fact you are trying to spin what I wrote to an extreme to make your point.

By the way, it is not about 100% success assurance but assurance that failure does not brick the unit. This is an assurance that should be, and can be, close to 100%, indeed a good number of 9s because, obviously you cannot brick 2 cars out of 100 for every software upgrade!

jandrese · on Nov 14, 2023

Yeah, but how did the vehicle not just reject the wrong cert and refuse to flash the update?

mkipper · on Nov 14, 2023

I've never worked in automotive but it's pretty easy to imagine how this might play out in a car, where a single update might bundle updates for several programmable devices.

It's easy to imagine a central SoC receiving the update, verifying its signature against a local key and then reprogramming some MCU over an internal interface. But then after resetting the MCU, you realize that the image you just flashed isn't compatible with the boot security keys burned into that MCU. It's not uncommon for a device performing the OTA update to not have access to the "source of truth" keys / certificates used to verify the updated image at boot time.

Not that this is a great excuse. If you add OTA updates to a product that has this design, you should really be confident in your recovery solution.

DannyBee · on Nov 14, 2023

The firmware is probably not just a signed package but signed binaries in the package as well. One is probably signed with the wrong cert.

This would not cause the updater to fail unless it verified the certs of all the binaries in the package, which most don't

mlyle · on Nov 14, 2023

The code went through early release tests successfully; the problem came with how it was more broadly released.

They should have had further staging of the rollout (randomizing when it is offered to users).

whalesalad · on Nov 14, 2023

A/B partitions tends to solve that. You will only switch to the new partition when the update is 100% verified installed. If it doesn't complete in an atomic manner, your device will just boot into the previous healthy partition.

AlotOfReading · on Nov 14, 2023

A/B gets complicated in the real world. BL1 may not support A/B for example, so to implement A/B bootloaders you may need a shim that can read/write NVM to handle that. Your HSM may not have slots for multiple keys to have different signatures, so upgrading one may trample the other if your update code doesn't check that.

Lots of ways to screw this up, especially in automotive where you're likely to be dealing with TI and their (in)secure boot.

I've solved this problem god only knows how many times now and I've rarely found an automotive board that doesn't introduce fun, new edge cases. OTA can't exceed x kilobytes of memory, the processor isn't fast enough to verify signatures and write the image in < x seconds, can't write the image to flash unless the signature is verified, but the image doesn't fit in RAM, the server delivering the update is 3+ networks away from the device receiving the update, etc.

Dalewyn · on Nov 14, 2023

With all due respect, that all sounds like Programmer Induced Problems(tm).

Cars are a long solved problem, being around for over a century. Telephony and computing hardware and infrastructure today are in the realm of the ludicrously good compared to even just a few decades ago, even if we consider bottom of the barrel worst case scenarios. If software somehow can't work a solved problem using ludicrously good hardware, the programmers (and their managers) are the problem.

whalesalad · on Nov 14, 2023

I agree here. A partition is just a partition. It's taking one disk and abstracting it into two. This is not hard.

The rovers on Mars, the Voyager, these problems have been solved for a long time. The compute in a Tesla these days can probably run Crysis.

You can do this OTA to a Raspberry Pi running Nerves via remote SSH and it works really well. The Nerves runtime utilizes A/B partitions for OTA updates.

mypalmike · on Nov 14, 2023

I once worked at a startup that sold a very expensive "enterprise" network appliance that didn't have partitioning. I had worked on network appliances before which did have this capability, so I asked the VP of engineering about it. He said it just wasn't a priority given all the other work that needed to be done. I wouldn't be surprised if Rivian had the same startup mentality that might lead to such a situation.

AlotOfReading · on Nov 14, 2023

Let's use two of the examples I gave above. How would you go about modifying the silicon to support new features in the boot rom, and how do you get around the pigeonhole principle when that silicon vendor doesn't ship enough bits of OTP memory to store multiple keys? M-x butterfly doesn't quite get there, but maybe there's another Emacs command I could use?

whalesalad · on Nov 14, 2023

You choose different silicon? You get a different vendor? You build your own boards and chips?

For a vehicle that costs $100k+ it shouldn't be hard to double or triple the budget for onboard compute considering it is vital to the operation of the entire vehicle.

AlotOfReading · on Nov 14, 2023

Note that I never claimed the automotive world has functional people processes. Functional people processes make a lot of technical issues much easier, but they're usually off the table in traditional manufacturing. The security team insists on x requirements for security reasons. Hardware team insists on this chip because it's the only one that makes the budget work.

A shockingly large part of my job is telling both that one team won't be getting what they want and to work it out among themselves. Rinse and repeat between dozens of boards because the relevant teams don't talk to each other and none of them read the "design requirements to ensure we don't have to tell you no" doc either. One time they didn't even tell us there was a board until the end of December, when delivery was scheduled for Feb.

whalesalad · on Nov 14, 2023

I live in Detroit so what you’re saying does not surprise me in the least bit. I have some friends in automotive at some of the big 3, and suppliers, and I’ve heard some really terrible stories.

mlyle · on Nov 14, 2023

If the comment I replied to originally contained a mention of A/B partitions, I missed it.

MichaelZuo · on Nov 14, 2023

The 'early release tests' weren't testing an identical copy of the actual update?

DannyBee · on Nov 14, 2023

It's probably closer to:

The test vehicles accept test/prod signed versions

Regular vehicles only accept prod signed versions

They are otherwise identical.

The test vehicles were sent test signed versions

The prod vehicles were sent the exact same update, signed with test.

This would not be uncommon since the test vehicles probably occasionally run test releases for debugging.

Further, the update is probably multiple signed pieces, and the only part accidentally signed with test was likely infotainment software.

Or something like this.

It's hard to believe they wouldn't test sending badly signed updates, so i have to imagine it's a particularly weird badly signed update.

In other words, i would not assume they are idiots.

AlotOfReading · on Nov 14, 2023

It'd be pretty silly to implement an OTA scheme that didn't check signatures before installing updates. That would mean any random attacker could soft-brick the module by sending an invalid image, which a development image should be to a production vehicle.

You could get this situation if the application code accepted signatures the bootloader does not though. I can imagine that accidentally occurring.

DannyBee · on Nov 15, 2023

Sure, but it could also be signed binaries inside the OTA, a variant of what you are talking about.

IE OTA is a signed package, inside package are also signed binaries. OTA itself is properly signed, a single binary (infotainment) is signed with wrong key.

While most OTA verifiers will verify the OTA signature (which this would pass), most don't verify the individual inside-package binary signatures at install time, only at runtime.

AlotOfReading · on Nov 15, 2023

That's not how I've ever implemented OTA, but I'll grant that it's possible if it was designed by someone with no idea what they were doing. Certainly not a good OTA process though, for this among many other reasons.

mlyle · on Nov 16, 2023

It makes sense for each component to check signatures of its code to prevent various kinds of attacks -- e.g. someone coming and reflashing just infotainment or motor controllers with something malicious.

So, OTA update comes in, containing a bundle of software for different subsystems. It's sent to different subsystems. Then, those subsystems check integrity at startup, but one subsystem's bootloader isn't happy because the firmware looks to be invalid.

You can only prevent this if the OTA knows how to do equivalent verification for every subsystem in the car that checks integrity. (And, of course, even if you do this, there's other ways you can go wrong that aren't specific to integrity checks).

DannyBee · on Nov 17, 2023

Uh, this is a perfectly normal ota process when dealing with multiple embedded systems.

It seems silly to off the cuff assert that it is a bad process or only done by those who have no idea what they are doing.

intern4tional · on Nov 15, 2023

Likely not their code.

OTA is generally developed by tier 1, so this is probably a bug in the tier 1's code. (Samsung, Panasonic, Sony, etc are common tier 1s in this space.)

hef19898 · on Nov 14, 2023

I am still not sure why I would update software on car, a piece of hardware that, IMHO, shoupd be able to run air gapped 24/7. Exceptions: recurring bugs, GPS maps and security updates. All of which can be done either during service (preferred, if they brick it, they are liable) or by plugging in something. OTA updates just seem completely pointless...

Edit: Also, why the heck isn't the entertainment system completely air gapped from the software running the car?

enragedcacti · on Nov 14, 2023

Rivian consistently ships a lot of new features and improvements, you can see the changelogs here [1]. I think you can pretty fairly critique a lot of them with: They are just solving a problem that they created by making it too techy, or they are shipping stuff they should have completed before releasing the product. I do think its hard to argue that the updates aren't adding anything of value though. There's convenience stuff like pet mode or bird's eye camera view that were added after release, but there are also things like new driving modes (soft sand and snow) or improved DC charging curves and smarter battery conditioning that legitimately improve the quality of the product as a vehicle.

> Edit: Also, why the heck isn't the entertainment system completely air gapped from the software running the car?

As for this, the entertainment system can control basically every feature of the car and is often the primary or only way to accomplish certain things. Even in much much dumber cars the infotainment is still part of the CAN bus and is able to interact with the rest of the vehicle.

https://rivian.software/category/public/

hef19898 · on Nov 14, 2023

Funny, our 2020 MY Jaguar controls car functions from the digital screen in front of the driver, the middle console screens only control AC, entertainment, phone, navigation and other non-car related stuff. No idea how the architecture looks behind all that so. But seriously, even if on the same bus, just don't the media player, radio and connected phone access to the systems actually running the car from engine to brakes. And please, please, finish developing the embedded software running on car before shipping said car. Then it can be air gapped, if not it requires OTA and internet access, raising all kinds of security issues...

CamperBob2 · on Nov 14, 2023

And please, please, finish developing the embedded software running on car before shipping said car.

Sorry, but things no longer work that way, and never will again. This is a good thing, as long as processes are improved to avoid situations like this one.

lotsoweiners · on Nov 15, 2023

> This is a good thing, as long as processes are improved to avoid situations like this one.

How do you figure. I can’t think of a single thing in my vehicle that could be improved by software. When I buy a car I’d prefer it was done.

LoganDark · on Nov 15, 2023

It's supposedly a good thing because more features can be added over time, but if they were features worth adding in the first place, the car should have shipped with them already.

hef19898 · on Nov 15, 2023

That is the big difference between hardware and software engineering: Once hardware is shipped, there is nothing you can do besides repaurs and retrofits requiring a workshop. Software can updated and changed today anytime.

My biggest issue with modern cars, and it seems this is spreading to other embedded systems than cars, is treating those as software: connect them to the web and run OTA updates everytime you need to fix a bug. That requires some form of inzernet cinnection, which requires regular security updates, which require OTA update capability. All that because, bluntly, software devs cannot be bothered with just finishing software running on non-connected hardware that just works, not can they be bothered following hardware development in case hardware is the main component, as in cars. And no, no car manufacturer is a software company, nor phone maker like Apple.

The proper way of fixing software bigs in automotive used to be, again thank you Tesla for breaking something that just worked, to recall the affected cars to a workshop to conduct the software update. Honestly, the only thing on a car that should be updated by the owner is maps for the GPS unit.

jacquesm · on Nov 15, 2023

Things work like that just fine: buy an old car and call it a day.

The less software in my car the better.

jacquesm · on Nov 15, 2023

You may not have a say in the matter. Most cars are on some kind of IoT private network with their own cell modems and if the manufacturer decides to push an update there isn't much you can do about it short of ripping out the cell modem. Which may well have unpredictable consequences.

refulgentis · on Nov 14, 2023

Rollouts don't solve problems, they limit who they effect.

mlyle · on Nov 14, 2023

Is not reducing the effective cost of a bad update by 10x or more worthwhile?

Sure, but if you are rolling out to 1% of users per hour, you detect the problem in a couple of hours and much fewer than 2% of users will have applied the update. This is a relatively small support problem.

While if you roll out to everyone at once, you'll detect the problem sooner (within an hour) but have 10x as many affected.

xyst · on Nov 14, 2023

When a car company is losing money on every car sale. C level execs going to cut corners

dewski · on Nov 14, 2023

This is a bad take.

xyst · on Nov 14, 2023

Rivian layoffs earlier this year [2] combined with reports of $33K loss per sale [1]. Rivian is hemorrhaging money right now.

RVN IPO’d at $150/share. Now it’s trading at $16/share.

All of these indicators of poor leadership to me. No sustainability. Burning cash. Poor company outlook. Poor products.

[1] https://tfltruck.com/2023/10/rivian-financial-results-losses...

[2] https://www.theverge.com/2023/2/1/23581642/rivian-layoff-ev-...

OneLeggedCat · on Nov 14, 2023

This is an inadequate comment.

worik · on Nov 14, 2023

cs702 · on Nov 14, 2023

It's easy to underestimate how hard and expensive it is to build, deploy, and remotely upgrade software that runs reliably on a fleet of diverse cars (different models, different years, slightly different components from batch to batch, etc.). It makes updating a mobile phone OS look trivial in comparison.

So far, only Tesla seems to be able to update car software remotely, regularly and reliably. I'm certain it's neither easy nor cheap.

All things considered, physical buttons and dials are probably easier and cheaper, because they don't require software updates!

VyseofArcadia · on Nov 14, 2023

Forget updates entirely. My car is one of the few places I expect to get software that works the first time.

If you absolutely must have updates, then at least not OTA updates. Have them done at the dealership or service center so any issues can be dealt with immediately.

Come on, is this engineering or hacking? This is a car, not a CRUD app. Get. It. Right.

dagmx · on Nov 15, 2023

That’s how things used to be and it resulted in lots of long standing bugs because the update rates were low, and so manufacturers didn’t push updates. Many people don’t live near dealers or service centers or can afford the continued cost (it’s not free usually unless it’s a recall)

OTA is better for consumer when done properly. Other manufacturers manage it fine, and one bad example shouldn’t be what we base things on. It’s what we should learn from and improve on.

jacquesm · on Nov 15, 2023

But average quality was a bit higher because nobody thought 'oh, we'll fix that next week in the OTA update'.

dagmx · on Nov 15, 2023

I don’t think so. This is the same thought people apply to cartridge based games for example, before software updates.

But the reality is that the same types of bugs would likely exist. Nothing about the engineering aspects has really changed other than more features. All that would happen is future revisions would have it fixed and early buyers would be stuck.

jacquesm · on Nov 16, 2023

The types of bugs were similar but the bug count was entirely different. Typically on a 16K ROM you'd have two or three known bugs after the product was in the market for a while and maybe years later you'd find another one by re-reading the code and realizing that if you tweak things just so you can get the product to misbehave.

But you're not looking at near endless lists of known issues and 'wontfix' wasn't a thing back then.

dalyons · on Nov 14, 2023

eh i guess i disagree. We had that (& still do for some cars) for decades, and it universally resulted in terrible software that you were stuck with for the life of the car. Hard to update == hard to iterate == bad software.

ClumsyPilot · on Nov 15, 2023

bad software is the one that kills people. Ugly software that works is fine

dalyons · on Nov 15, 2023

in cars it is often is ugly AND doesnt work.

w0m · on Nov 14, 2023

random new features via OTA updates was one of the deciding factors when i bought my car ... :)

I also mostly WFH so... yea. lol.

matrss · on Nov 14, 2023

> All things considered, physical buttons and dials are probably easier and cheaper, because they don't require software updates!

I am pretty sure there is a market for a dumb modern car, but no one is building it. I am thinking of an electric car without anything "smart" in it. Modern safety features can stay, if they work completely self contained and without requiring an external connection ever over the lifespan of the car.

iso8859-1 · on Nov 15, 2023

I wonder if it is somehow possible to use an open source battery management system to build a car like this. See https://foxbms.org/

jacquesm · on Nov 15, 2023

Regulatory pressure may well get you to do stuff you wouldn't want to do.

NotYourLawyer · on Nov 14, 2023

I’d buy that car today.

wannacboatmovie · on Nov 14, 2023

This isn't a bunch of Windows PCs home-built from a hodgepodge of components.

They designed, built, and shipped all the hardware. There is ABSOLUTELY NO excuse for not having a database of the exact hardware configs by serial number. They have the ability to test every single shipped configuration.

If they don't, they have already failed as a car company.

AlotOfReading · on Nov 14, 2023

I guarantee they have a database with the hardware configs. It's required by NHTSA to do recalls and notices. They'll undoubtedly be using that to inform the right people to come in.

The update servers almost certainly don't talk to that system though.

wil421 · on Nov 14, 2023

> So far, only Tesla seems to be able to update car software remotely, regularly and reliably. I'm certain it's neither easy nor cheap.

My Jeep Grand Cherokee has OTA for over 5+ years. BMW has been doing it since 2018.

I’m almost positive a family member had it with GMC on star back in the late 2000s.

willio58 · on Nov 14, 2023

I don't think the Jeep or BMW infotainment systems are nearly as fleshed out or complex as Rivian's, especially not Tesla's. Maybe I'm wrong!

phpisthebest · on Nov 14, 2023

Well then we need to ask why is their infotainment systems so complex? and does it need to be?

I want my infotainment systems go connect to Android Auto. That is is.

Make it do that, and only that.

This drive to make EV's as complex as possible is one of the reason i am not planning on buying one

EV's are suppose to be SIMPLER than ICE. Make me a Simple Car with simple controls, and just replace the ICE with a battery and Electric Motor, give me an app for my Phone that can do the Charging Trip Calculators and interface with other systems.

I do not want a compplex SaaS app on wheels

bdamm · on Nov 14, 2023

Well, good news, Bollinger has made your product!

LoganDark · on Nov 15, 2023

No they haven't. Bollinger only makes commercial trucks.

cobalt · on Nov 14, 2023

android auto is just a different brand of infotainment system

worik · on Nov 14, 2023

And if it fails it is inconvenient

wil421 · on Nov 14, 2023

That’s a huge plus for me. CarPlay or nothing. BMW is becoming closer to a Tesla like screen. GM is supposed to drop CarPlay in favor of whatever they are doing on their EVs.

I don’t want my ICE/EV to become a SaaS app where I’m paying $500 a year to use my own car.

vel0city · on Nov 14, 2023

> GM is supposed to drop CarPlay in favor of whatever they are doing on their EVs

IIRC its not just their EVs its all cars.

fma · on Nov 15, 2023

Shouldn't make such a bold IIRC statement correcting someone without first doing a quick Google to verify.

https://www.caranddriver.com/news/a43488135/gm-apple-carplay...

Specifically it's for the new Ultium EVs. Bolts, Lyriqs, Hummers will still have it.

vel0city · on Nov 15, 2023

If I had known it for a fact I wouldn't have prefaced it with "IIRC".

Guess I didn't recall correctly. Thanks for pointing that out.

jacquesm · on Nov 15, 2023

BMW in particular is an interesting case, a (late) friend of mine drove just about every model the day after it came out (BMW fan) and they spent more time in the shop for software issues than they did driving. To the point that he'd get attached to some of the loaners, it really was that bad.

duped · on Nov 14, 2023

Updating software is orthogonal to the complexity of the software application being updated, unless you have horribly designed your architecture. I know, because I've made that mistake.

bri3d · on Nov 14, 2023

> All things considered, physical buttons and dials are probably easier and cheaper, because they don't require software updates!

Almost all automotive control modules have firmware, whether that firmware is parsing touchscreen inputs or a rotary encoder.

NotYourLawyer · on Nov 14, 2023

Well sure, but the rotary encoder can’t get moved to a different menu tree by a software update, and I can use it without taking my eyes off the road. I know which I prefer.

xyst · on Nov 14, 2023

Tech junk shouldn’t go in cars, period. Cars shouldn’t be as pervasive and prevalent in society (at least in USA). Yet here we are. Car manufacturers have spent an insane amount of money over decades to get to this point (buying legislators, forcing highway infra, subsidies, profit driven strategy over sustainability)

treesknees · on Nov 15, 2023

Decades? Try almost a century. For better or worse, our cities and various economies were built around the automobile.

It's still a free market - these companies could choose not to put tech into their product. But look at the backlash against GM when they announced they wouldn't support Apple Car Play or Android Auto. Consumers want it.

FireBeyond · on Nov 14, 2023

> So far, only Tesla seems to be able to update car software remotely, regularly and reliably. I'm certain it's neither easy nor cheap.

Tesla, whose computer systems quite regularly need to be hard rebooted while the car is driving? That Tesla?

code_runner · on Nov 14, 2023

I had to do this once or twice (its very very infrequent in my experience) and one time it was genuinely terrifying, as I had lost blinkers etc where a few interstates all intersect and merge etc.

I still do love the car though.... but a very sketchy moment that I shouldn't have brought on myself while driving in that situation.

xienze · on Nov 14, 2023

> I had to do this once or twice (its very very infrequent in my experience)

This is something that’s _never_ supposed to happen.

> but a very sketchy moment that I shouldn't have brought on myself while driving in that situation.

How on earth can you rationalize a Tesla performing an update/hard reset while driving as _your_ fault? It should never be allowed to happen!

pcchristie · on Nov 15, 2023

Tesla doesn't do that, you're misunderstanding. You can do a hart reboot on the computer by yourself, by holding down both steering will scroll wheels for 10 seconds. Fixes any glitches with the screen.

fma · on Nov 15, 2023

So the commenter is driving through intersections without blinkers and supposed to do that? I'm not sure if this is a joke or not.

I need to teach my grandma how to hard reboot a car without crashing?

vmladenov · on Nov 15, 2023

A lot of silliness has been tossed around about Teslas in this thread so let's be clear about how they work. A Tesla vehicle have two main independent systems:

- The "car" that operates the motors, blinkers, shifting, Autopilot, etc. It also supplies standard readouts via OBD that you can monitor independently e.g. with a phone app. As far as I'm aware, it is impossible to turn this machine off except when the car is stationary, in park, and the brake is fully pressed (or by disconnecting the 400V battery leads under the rear seat).

- The infotainment computer is a small Linux machine that displays the map, speedometer, music, plays the sound of the blinkers through the speakers, and random other stuff. This system can be reset at any time by holding the two scroll wheels.

There are valid things to criticize about this setup e.g. by not having a standard fallback display of speedo, but you can still read speed from OBD at all times. And Tesla has broken things in OTA updates before, but their rollouts are heavily staged. The last major revert I can remember was FSD beta 10.3 in 2021 back when it had reached a few 10s of testers.

code_runner · on Nov 15, 2023

this is correct