Federated Learning

ur-whale · on Aug 26, 2018

TL;DR :

Compute the gradient of the error on the user's device and ship that to a server-side centralized model to update its weights.

It's a very cool idea that has a lot of interesting applications (among which: learning large statistical user behaviors without "spying" on them).

However, there is an unspoken claim that the gradient update doesn't carry enough information about the user data to reconstruct any of it server-side.

I'm still waiting to see a formal proof of that, and my gut says, if the servers sees enough gradient updates from a given user, it's likely possible to rebuild the original data.

marten-de-vries · on Aug 26, 2018

> However, there is an unspoken claim that the gradient update doesn't carry enough information about the user data to reconstruct any of it server-side.

This was a concern for me as well, but the 'Privacy' section of the post addresses this. In short, the algorithm is adapted such that the influence of a single user on the model is limited, and noise is added. I'm not knowledgable enough on differential privacy to know if that covers all possible privacy attacks, but it looks like a good start.

Personally, I'm now more worried about adversaries trying to mess up the model. How many clients need to submit fake updates for the training process to never converge? If it's 50% that's probably fine, but I'm afraid a much smaller amount of users could derail the process already.

LeanderK · on Aug 26, 2018

VERY interesting questions! Unfortunatly I can't answer the questions, since I don't know enough about them.

To make the literature search easier: Your second cocern is called "poisioning attacks" and is one of the problems "adversarial machine learning" is concerned with.

amelius · on Aug 26, 2018

> In short, the algorithm is adapted such that the influence of a single user on the model is limited, and noise is added.

But in any case (added noise or not), the user-provided weight-updates are improving the model in a certain way. So I suppose that, based on this fact, it inevitably leaks information about the user. For example, assume we are training cat and dog images. Run a test with 1000 validation images of cats and see how much the network got right. Then add the user-provided updates, and see how much the network got right. The difference tells us something about the user's images. This doesn't necessarily work in every case, but statistically it could paint a picture.

(Of course, happy to be proved wrong)

laughingman2 · on Aug 26, 2018

Though not the same exact problem, the current machine learning research in nlp to remove unrelated bias is facing trouble in removing demographics from learnt representations.

Vector representations of textual data of users encode demographics data even when they are explicitly trained no to, and the task they are trained for seems tangential to the demographic data.

See this paper https://arxiv.org/abs/1808.06640 by Yanai Elazar, Yoav Goldberg.

"Adversarial Removal of Demographic Attributes from Text Data"

Recent advances in Representation Learning and Adversarial Training seem to succeed in removing unwanted features from the learned representation. We show that demographic information of authors is encoded in -- and can be recovered from -- the intermediate representations learned by text-based neural classifiers. The implication is that decisions of classifiers trained on textual data are not agnostic to -- and likely condition on -- demographic attributes. When attempting to remove such demographic information using adversarial training, we find that while the adversarial component achieves chance-level development-set accuracy during training, a post-hoc classifier, trained on the encoded sentences from the first part, still manages to reach substantially higher classification accuracies on the same data. This behavior is consistent across several tasks, demographic properties and datasets. We explore several techniques to improve the effectiveness of the adversarial component. Our main conclusion is a cautionary one: do not rely on the adversarial training to achieve invariant representation to sensitive features

amelius · on Aug 26, 2018

Yeah, and perhaps there's not enough information to rebuild the original data, but still enough information to tell something about the user. For example, my gut feeling says that if a user uses the word "cat" a lot, then that would be reflected in the weights, and it would be possible to tell the difference between a cat-person and a dog-person.

TOMDM · on Aug 26, 2018

If you wanted to take user privacy super seriously, I wonder how feasible it would be to set up a TOR style network to allow users to pass their update back to your server?

That way, you don't know who each update came from, and if you isolated each update, you wouldn't even know if two updates came from the same user.

I haven't taken much time to consider the ramifications, my gut says this would open you up to malicious users who could pass in malicious updates in an attempt to train unwanted behaviour in your model, but I believe this would be an issue with any federated learning approach unless you only use trusted users.

It also does away with the nice efficiency gain you get by averaging the samples based on how many examples the user had.

Of course, this is only useful when reconstructed data doesn't contain enough metadata to ID users anyway.

marten-de-vries · on Aug 26, 2018

In practise, end-users will probably not setup a TOR(-style) network themselves. That means you'll need to do it for them, so they still have to trust you. At that point, just not storing their IP address is probably easier.

There are still advantages to the TOR-style solution (you wouldn't have the ability to track users without resorting to backdooring), but the slowdown and extra complexity is probably not worth it in most situations.

ddtaylor · on Aug 26, 2018

What do people think of OpenMined related to "Federated Learning" ?

https://www.openmined.org/

roystonvassey · on Aug 26, 2018

Cool idea! I see the iOS keyboard already do this to an extent. I often type words in my native language in English. I see that over time it learns these and predicts a bit better when I need to type it again. Of course, if it collected a larger corpus from many such users, allowing us to tag this as, say a corpus of Kannada words, I can see a cool application of this kind of learning

milkey_mouse · on Aug 26, 2018

Has anybody tried applying this to a blockchain by having the PoW be improvements to models in previous blocks or something? This seems like such an obvious idea (especially with all the hype around blockchain nowadays) that either this exists or I'm missing something crucial about the technique.

edhu2017 · on Aug 27, 2018

definitely has been done before.