Lol. I'm an embedded engineer, and this week I'm getting my coworkers excited to upgrade part of our system to use bleeding edge technology from the mid 90's!
I just went and checked, I switched our main product over from ARV Megas' to ARM Cortex in 2013. The only downside is that the ARM's don't have as good sleep current as AVR's do. And they don't have internal EEPROM.
Other than that they are faster, cheaper, and have more RAM.
PICs and assembly are honestly pretty great, if you know how to use them like that. I've never actually used C on a PIC because they are so resource constrained, but they certainly do have their uses.
Low end cortex-m cores are pretty cheap these days. More modern designs and processes use silicon more efficiently, so you get more capability for the same raw material cost. Also, you can do a surprising amount with an 8-bit MCU. Lots of very capable 3D printers still run on AVR based controllers. Often times, you're limited by peripherals or memory size more than raw CPU speed.
As a baseline, it's relatively easy to make an autonomous aircraft that follows GPS waypoints using an atmega328 and have CPU cycles to spare, even using software floating point math everywhere. All the open source drone firmwares recently dropped support for orders of magnitude more powerful cortex-m3 flight controllers, though, and are encouraging folk to migrate to cortex-m7 based controllers. That's crazy! Software tends to expand to fill up any available volume over time...
Really... I wonder what the spare cycles are doing, whether it's truly waste and bad coding or if it's doing some kind of sophisticated predictive algorithms that we're unaware of. I do try to assume engineers did something smart, maybe there's a good reason...
I really hope the reason isn't "so we could run the mainline linux kernel" or "because we needed a RTOS" or something similarly bloated.
There is tons of human value in doing things good enough rather than perfect.
Why implement quicksort when bubble sort will give you the result you need just as "instantly" and be easier to verify?
Why bother procedurally iterating line by line through a 1GB of text in python, when you can call f.read().split("\n") inside a list comprehension and probably have it run just as fast because you have a 64-bit 4Ghz superscalered mmxzomg Inteliapple CPU and 32GB of RAM.
Often times, the bloat is an intentional tradeoff to make something else better. More accurate code, more readable code, more configurable code, etc. Other times, it really is just bloated, more terrible code; even that has some value, if it meant that someone was able to contribute a useful feature that they wouldn't have otherwise been able to due to lack of skill or domain specific knowledge.
Ah, I would have thought it was more like "feedback loop" going to "feedback loop with DSP" going to "feedback loop with DSP with wireless" or similar.
I think I am thinking more specifically about the flight feedback example compared to what you are describing, but I'm definitely all for using cheap powerful ARM chips in place of obsolete stuff. I'm just hoping that most of that spare computing power was used for something cool and not just spinning the wheels.
Often spare cycles can be used to sleep the cpu for longer cpu intervals and save power. Faster processors are often more efficient per watt, which can save a bit of power.
More advanced DSP takes more cycles. Eg moving from fixed low-pass filters to motor-rpm-following-notch filters gives lower phase delay, but requires getting telemetry from motors, running multiple harmonic notch filters on each channel. Supporting a wider range of protocols and peripherals takes more flash space. And eventually, when it gets complicated enough that you can't enforce deadlines everywhere, yes, you need an RTOS.
At least for drones, the move has been driven equally by flash space as compute. And the performance really has improved! Moving from fixed filters to rpm-following notch filters for example. Supporting wider range of peripherals and protocols. None of this comes for free.
I'm not disagreeing with you, but I don't think it's usually as simple as that for anything useful of non trivial complexity.
Most of the time, more power also brings more complexity. Cortex-M7 mcus, for example, typically can't reach their max CPU throughput without turning on instruction and data caches or "Tightly Coupled Memory". Data caching opens up a whole can of worms when interacting with memory mapped peripherals. Some MCUs with data caches have peripherals glued to them that are fundamentally incompatible with caching, leading to having to use indirect tricks like using DMA transfers to interact with them. TCM partitions your available memory space, leading to arbitrarily complex application specific linker scripts.
Newer chips are sometimes less capable on different axes. It's easier to use a 5V chip to interact with a 5V circuit, rather than a "better" MCU that is only 3.3v tolerant and requires a filtered 1.8V power rail for its internal circuitry. More complex CPUs generally have less predictable and potentially slower interrupt timing too.
Spinning new hardware also isn't cheap like spinning new software is, both in terms of dollars and in externalities. A popular community maintained firmware dropping support for an old but popular hardware generation that's become burdensome to maintain due to incremental feature bloat may be perfectly justified by the volunteer developers, but it does effectively turn that old hardware into e-waste. A company producing widgets with embedded components in it might want to defer the risk and schedule hit of switching to a new MCU architecture for as long as possible, in order to balance overall company goals and limited engineering resources. Justifying engineer months of work "because it's better and everyone will like working on it more!" is a lot tougher to prioritize than "because we only have enough stock left for 6 months and then we're scraping ebay".
PICs have extremely low latency interrupts compared to ARM, as well as the assembly language being really simple. 8-bit MCUs can be really great at some things where ARM isn't just overkill, it can almost be a problem.