Hacker News new | past | comments | ask | show | jobs | submit login

> However, there is an unspoken claim that the gradient update doesn't carry enough information about the user data to reconstruct any of it server-side.

This was a concern for me as well, but the 'Privacy' section of the post addresses this. In short, the algorithm is adapted such that the influence of a single user on the model is limited, and noise is added. I'm not knowledgable enough on differential privacy to know if that covers all possible privacy attacks, but it looks like a good start.

Personally, I'm now more worried about adversaries trying to mess up the model. How many clients need to submit fake updates for the training process to never converge? If it's 50% that's probably fine, but I'm afraid a much smaller amount of users could derail the process already.




VERY interesting questions! Unfortunatly I can't answer the questions, since I don't know enough about them.

To make the literature search easier: Your second cocern is called "poisioning attacks" and is one of the problems "adversarial machine learning" is concerned with.


> In short, the algorithm is adapted such that the influence of a single user on the model is limited, and noise is added.

But in any case (added noise or not), the user-provided weight-updates are improving the model in a certain way. So I suppose that, based on this fact, it inevitably leaks information about the user. For example, assume we are training cat and dog images. Run a test with 1000 validation images of cats and see how much the network got right. Then add the user-provided updates, and see how much the network got right. The difference tells us something about the user's images. This doesn't necessarily work in every case, but statistically it could paint a picture.

(Of course, happy to be proved wrong)




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: