Velero can be configured to run on a schedule, but the scheduled command was apparently not the exact same command they were using to perform the manual tests - the scheduled job was missing part of the command, basically.
Sp they were manually doing a backup and then testing that backup, rather than testing their automated backups? If thats what they were doing that just makes me wonder... why?
It's possible that the backup restore/test process was still 'automated' to some degree, like a manually-run pipeline or playbook etc.
Obviously there were mistakes made in process design and tools deployment but I don't think this is 'baffling incompetence' more than it is a couple of small mistakes that compounded each other to have a pretty spectacular impact.
I am not sure about the pricing structure of their provider, but maybe it is a Hotel California situation? Data egress price of the provider was such that they did not want to pay to retrieve the full backup "just for a test"? Instead, repeat the backup command, but swap it out for a local location.