If you wanted to take user privacy super seriously, I wonder how feasible it would be to set up a TOR style network to allow users to pass their update back to your server?
That way, you don't know who each update came from, and if you isolated each update, you wouldn't even know if two updates came from the same user.
I haven't taken much time to consider the ramifications, my gut says this would open you up to malicious users who could pass in malicious updates in an attempt to train unwanted behaviour in your model, but I believe this would be an issue with any federated learning approach unless you only use trusted users.
It also does away with the nice efficiency gain you get by averaging the samples based on how many examples the user had.
Of course, this is only useful when reconstructed data doesn't contain enough metadata to ID users anyway.
In practise, end-users will probably not setup a TOR(-style) network themselves. That means you'll need to do it for them, so they still have to trust you. At that point, just not storing their IP address is probably easier.
There are still advantages to the TOR-style solution (you wouldn't have the ability to track users without resorting to backdooring), but the slowdown and extra complexity is probably not worth it in most situations.
That way, you don't know who each update came from, and if you isolated each update, you wouldn't even know if two updates came from the same user.
I haven't taken much time to consider the ramifications, my gut says this would open you up to malicious users who could pass in malicious updates in an attempt to train unwanted behaviour in your model, but I believe this would be an issue with any federated learning approach unless you only use trusted users.
It also does away with the nice efficiency gain you get by averaging the samples based on how many examples the user had.
Of course, this is only useful when reconstructed data doesn't contain enough metadata to ID users anyway.