Why are you collecting personal information (IPs) of your readers though?

xena · on Dec 19, 2021

Really I have the nginx logs going to the disk so that I can have prometheus-nginx scrape them for referer patterns (it getting above a certain threshold for a given referer is how I know when my article got posted somewhere), status code rates and overall to make sure that the core functions of the site (such as the RSS feed) are working like I expect. I could probably change it to write the logs to a unix fifo instead of a unix file, but I actually do end up going back to look at the logs for things like people attempting to exploit my code so I can harden my site appropriately.

Beldin · on Dec 19, 2021

Thanks! That's an informative reply.

I was wondering whether not logging could be an option. That's now quite clear.

xena · on Dec 20, 2021

I just checked the logs and did some reverse IP lookups at random and it turns out nearly all the IP addresses I have are cloudflare IPs. So I don't even have IP addresses logged the way I thought I did! Yay me!

jcims · on Dec 19, 2021

Not OP, but IPs are not always considered personal information. If you never establish the identity of the consumer directly, its not clear that the effort required to convert that address to an identity meets the bar of 'reasonably capable'.

The point of my original comment was that there is gray area here and people dismissing it outright as obviously bogus are not thinking very critically about it. I think this is a good example of that.

dawnbreez · on Dec 19, 2021

It's kind of hard not to log IPs when running a service over the internet. You kind of have to have their IP address in order to know where to send the info they want.

Further, logging an IP address has also been necessary for security--to detect DoS and DDoS attacks, for instance, as both involve many repeated connections from the same IP (though you can offload that to a service now).

smsm42 · on Dec 20, 2021

Why it is hard? The servers need to know the IP address, for sure, but they do not need to permanently record it. In fact, un-aggregated IP is probably not that useful for statistics either, I can be aggregated e.g. daily and then discarded.

You seem to be confusing temporarily holding information in the database (no matter whether in-memory or on persistent media) for operational purposes (be it networking or threat detection) and permanent storage of the same information way beyond the time it ceased to be useful for the purposes above.

dawnbreez · on Dec 22, 2021

As I understand it, from a legal standpoint, storing it for operational purposes is equivalent to storing it for any other purpose.

Also, even if you only store the information for 24 hours--how many users are on Hackernews in a given hour? How many of them click through the top links? Certainly enough that some sites have crashed under the load, which makes me think that the number is higher than 50K. Congrats, you're now storing 50K IPs, despite having an automated system to delete them in a day.

(I don't even wanna think about the headache it'd cause if you got dragged to court over this. I don't know how many judges will understand that you delete the list every 24 hours to avoid being liable for storing it, not to get rid of evidence.)