> Which is easily seen to be equivalent to perfect scaling.
Why? If the whole can be less than the sum of the parts, it can also be greater than the sum of the parts. Maybe two threads can do double the work in 150% of the time. But that would make for a funny definition of "perfect".
> If the whole can be less than the sum of the parts, it can also be greater than the sum of the parts
It can't be. That's not possible with CPU cores. It can be less but it can't be more.
Here's a proof: you can always timeshare two threads on a single core. If two threads can do 2x the work in 1.5x the time, then you can run that same code on one timeshared core to do 1x the work in 0.75x the time. Thus we have the concept of perfect scaling where double the cores can do, at most, double the work; you can't do better than that because whatever technique you used to achieve it can still be applied back to the single core.
(I'm sure other proofs exist, but the above should be sufficient to show why you can't beat perfect linear scaling.)
The reverse is not true. Just add in a mutex and your perfect scaling is ruined, because some of the CPUs have to wait. The more CPUs you add, the greater the amount of wasted CPU time because the mutex causes one part of the computation to run on a single core.
Why? If the whole can be less than the sum of the parts, it can also be greater than the sum of the parts. Maybe two threads can do double the work in 150% of the time. But that would make for a funny definition of "perfect".