Distillation would be the ideal way (especially because it also has efficiency g...

bertday · on March 3, 2023

Randomly perturbing the weights and then finetuning would probably make it impossible. If someone had access to the finetune dataset and you didn’t add noise, they could see if the finetuning curves intersect.

I guess in practice, it’ll look suspicious if you have an identical model architecture and have similar performance.