They don’t send unencrypted full-res files, they send low res “visual representation” and can only decode if they get > x “hits”. Assuming it works as described I do think it’s better than just having full keys as they do now. And why else would they go to all this trouble? They can scan images now on their servers if that’s what they want.
Low-res I suppose is better but...If it's enough for a human to tell whether it's CSAM or not, it's probably high-res enough to be a significant invasion of privacy in case of a mistake.
Also the > x "hits" part is a good feature assuming that the database only looks for CSAM. Otherwise it's useless (not to mention totally unauditable).
My guess is that they're doing it on device because they've had several years of marketing and proclaiming that "everything is done on-device" so to implement CSAM scanning server side would go against that. Maybe they thought this would somehow look better to the average consumer who thinks "on-device" is automatically better?