Generally solid advice (although its more like "how to pick a micro-controller for your design you're trying to keep as cheap as possible.") and one of the areas where I quibble is the use of "generator" tools to generate I/O code. The STCube application is easy to use for beginners but easily uses twice as much flash as actual user written code. That isn't a problem if your app is small and the flash is large but puts you at a disadvantage if you are competing with people who know how to write that code.
My philosophy has been to develop an understanding of the peripheral fairly deeply, write a 'driver' for it and a bunch of documentation that will remind me of the quirks when I come back to it, and then put that in a library. The benefit is 'easy code' (just use the library) that is more efficient and smaller than generated code, and a quick spin up document if I need to dive in and do more complex work with the peripheral.
That is a great point about the downsides of generated code that I didn't mention (but should have).
I don't think it uses twice as much flash per se, but it definitely does use 'more' than hand-rolling your own code.
But ST-Cube has also gotten MUCH better over the last few years and no longer generates a lot of unnecessary, yet compiled, code. The version I first used in 2012ish was pretty weak.
I try to approach this from the direction of what gets me up and running fastest (and what has the lowest barrier to entry for my team - including communication), and then I re-factor out the guts afterwards if needed.
Practically speaking, my workflow is that I have a set of heavily unit tested and mocked out middleware and device drivers, that are portable to any ARM... So, all I need are the low-level drivers which interact with the peripherals.
I can generate them with ST-Cube, hook them into my middleware/drivers through interfaces (C++) or linking (C).
From here, I run my automated integration tests to make sure everything works on my hardware - and once that's done, I evaluate my flash usage.
If needed, I gut the generated code in favour of something hand-rolled, while being able to make sure everything is still 'correct'.
Also, contextually, I almost never use anything less than 32kB flash - so I don't often need to hyper-optimize.
My philosophy has been to develop an understanding of the peripheral fairly deeply, write a 'driver' for it and a bunch of documentation that will remind me of the quirks when I come back to it, and then put that in a library. The benefit is 'easy code' (just use the library) that is more efficient and smaller than generated code, and a quick spin up document if I need to dive in and do more complex work with the peripheral.