"My M1 Mac's GPU still loses in several benchmarks against my 7-year-old 1060."
The GPU in the AS M1 is the fastest integrated graphics available in the mainstream computing market [1]. That is the competition, not a standalone, 120W GPU. Apple is purportedly now working on separating their GPU designs into a much larger heat and power profile (which contrary to some of the comments on here clearly isn't going to be for laptops, beyond an external TB4 enclosure) and it might just change things a bit.
Scaling a GPU is easier than scaling a CPU, by design. Apple's GPU has nothing to do with ARM.
And to your original point, yes, Apple does largely choose which workloads run on Apple Silicon and how. By controlling the APIs along with the silicon, Apple abstracts it to a degree that gives them enormous flexibility. The Accelerate and CoreML APIs are abstract vehicles that might use one or a thousand matrix engines, neural nets, or an array of GPUs. Apple has built a world where they have more hardware flexibility than anyone. And while close to no one is doing model training on Apple hardware right now, Apple has laid the foundation so a competitive piece of hardware could change that overnight.
[1] The SoC graphics of the chips in the PS5 and Xbox Series X have more powerful graphics, but the GPU chiplet alone on those systems uses a magnitude more power and more die than the entire M1 SoC. In another comment you mentioned that Zen 2 integrated graphics come close. They aren't within a ballfield, literally with 1/4 or worse the performance. In discussions like this unfortunately the boring "n years old / n process" trope is used to excess, yet again there are zero competitive integrated graphics on the market. None. Apple isn't a GPU company, yet here we are.
The GPU in the AS M1 is the fastest integrated graphics available in the mainstream computing market [1]. That is the competition, not a standalone, 120W GPU. Apple is purportedly now working on separating their GPU designs into a much larger heat and power profile (which contrary to some of the comments on here clearly isn't going to be for laptops, beyond an external TB4 enclosure) and it might just change things a bit.
Scaling a GPU is easier than scaling a CPU, by design. Apple's GPU has nothing to do with ARM.
And to your original point, yes, Apple does largely choose which workloads run on Apple Silicon and how. By controlling the APIs along with the silicon, Apple abstracts it to a degree that gives them enormous flexibility. The Accelerate and CoreML APIs are abstract vehicles that might use one or a thousand matrix engines, neural nets, or an array of GPUs. Apple has built a world where they have more hardware flexibility than anyone. And while close to no one is doing model training on Apple hardware right now, Apple has laid the foundation so a competitive piece of hardware could change that overnight.
[1] The SoC graphics of the chips in the PS5 and Xbox Series X have more powerful graphics, but the GPU chiplet alone on those systems uses a magnitude more power and more die than the entire M1 SoC. In another comment you mentioned that Zen 2 integrated graphics come close. They aren't within a ballfield, literally with 1/4 or worse the performance. In discussions like this unfortunately the boring "n years old / n process" trope is used to excess, yet again there are zero competitive integrated graphics on the market. None. Apple isn't a GPU company, yet here we are.