So it’s the opposite of what the normal behavior is, meaning the real client IP is guaranteed to be the last in the list. I probably should have a condition to get the IP based on this logic only if the app is hosted in heroku, then use the standard express way otherwise.
> meaning the real client IP is guaranteed to be the last in the list.
The last entry in the list is simply the IP address of whatever is making the request to Heroku. In my example, that's Cloudflare, being a proxy. Heroku simply appends the originating IP address (coming from a proxy) to the header which is what you would expect.
The Stack Overflow answer is addressing X-Forwarded-For spoofing, something you don't care about for geoip lookup. Someone could prefix 8.8.8.8 to the header before making their request, thus "X-Forwarded-For: 8.8.8.8, <Real IP>, <Cloudflare IP>", and it's inconsequential that your service will return results for 8.8.8.8 instead of <Real IP>.
The SO answer is wrong that this is Heroku-specific behavior. Heroku is simply appending the originating IP address to the header.
Obviously you can push a route to production that logs req.headers to see for yourself to nip this in the bud.
If we extend the scenario from
hombre_fatal's example to include IP spoofing like in that wikipedia article you linked, you end up with
IP: <Heroku's load balancer addr>
X-Forwarded-For: <Spoofed addr>, <Real user addr>, <Cloudflare addr>
So you either need to know the number of proxies that you trust that are in between you and the user, or you need to know the IP addresses of those trusted proxies, in order to determine which parts of X-Forwaded-For to trust.
The problem with geoip is that the free services will never be as good as the paid services, and the paid ones aren't all that accurate either.
For just about every geoip use case, there is a better solution. Namely, almost every modern phone and desktop is capable of providing it's location, and is more accurate than any geoip database.
The main issue is that people can make their device lie about it's location, so if you're using geoip for security (say you're a streaming service) then that's about the only valid use case, and that only exits because studios still want to live in a world where borders matter.
There are a lot of good applications for geo-ip tech. Banks probably want to flag if someone is trying to login a US-based account from China or using a known VPS/VPN provider as a proxy.
They don't need to be perfect and its certainly better than nothing.
>For just about every geoip use case, there is a better solution. Namely, almost every modern phone and desktop is capable of providing it's location, and is more accurate than any geoip database.
Unless I'm mistaken, it can be used for commercial uses as well, but only under the `Creative Commons Attribution-ShareAlike 4.0 International License`, which, for an API, might imply everyone using the API to also use the data obtained in accordance with that same license, but IANAL.
By third party I mean the API. You can easily embed the database using a library in every programming language I’ve ever seen. There is no reason for another API, and there is no reason for another risk factor by trusting that this new API is correct.
They also have a form on their site to submit corrections. Maxmind's db is so prevalent now, I suspect (indirect) users submit all the info they need nowadays.
> Oddly enough, that patent apparently expired today?
Not anymore, maybe they are reading HN as well :-)
2023-09-15 - Adjusted expiration
> 2019-12-09 - Application status is Expired - Fee Related
2005-09-20 - Publication of US6947978B2
2005-09-20 - Application granted
2002-07-04 - Publication of US20020087666A1
2000-12-29 - Assigned to GOVERNMENT OF THE UNITED STATES, AS REPRESENTED BY DIR. NAT. SECURITY AGENCY, THE NSA GENERAL COUNSEL (IP&T)
2000-12-29 - Priority to US09/752,898
2000-12-29 - Application filed by National Security Agency
Country-level data can be obtained from the 5 RIRs freely and via a standardized CSV-like document updated every day. I wrote an article about that, the first part shows how to get and parse that: https://www.ecalamia.com/blog/make-your-own-geoip-api/
I started IPinfo.io ~6 years (and launched it on HN: https://news.ycombinator.com/item?id=7239333). We now serve 20 billion geolocation API requests a month, and roll our own geolocation data (we used to rely on the maxmind data, but have been busy working on improvements to that, and then our own complete data, along with other data sets like IP usage type, company and carrier etc).
So what does this have to do with OP's submission? You wrote a whole article about courtesy "guerrilla marketing" on Stack Overflow. At least comment on their work before your advertising.
GeoIP is pretty accurate at the state/country level for most users, but you will run into precision issues at the city level.
A bigger problem seems to be that many forget to continuously sync their IP DB with their provider. Your targeting is only as good as your IP -> Geo map.
My team built a tool for testing GeoIP implementations here: https://www.geoscreenshot.com to get around the issue of testing if it works.
That depends on your use case. Huge numbers of people (e.g. people at work) use VPNs, and their 'geolocation' could be wildly different than their actual location. If you're an IBM employee (what...quarter of a million people) on the VPN, you look like you're in New York someplace. At my current employer (80k), most of us look like we're in Minneapolis, even though I'm half a continent away. If you're, say, targeting ads based on city/state level GeoIP, that's a lot of misdirected ads.
VPN users are an exception. I would think it would be best to use a proximate node to reduce latency.
For corporation, there is another form of targeting (account based targeting) that relies on IP ranges. I believe DemandBase covers this specific use case.
If you're interested in a service with a free tier and more specific granularity than the MaxMind free database, Geocodio has a pretty nice service. They also have a bunch of different enrichment options that you can tack on if you need things like congressional districts or school districts. It's a really nice service.
I always feel a need to state this to folks who are not aware of geolocation and ip addresses: Geolocation based on IP is very unreliable and should be used only for soft-analytics at best.
Example: It is not fit for security postures (in theory). One can dump all the CURRENT v4 routes being advertised out of China and block them via blackholes/firewalls/etc. However immediately after that a rogue operator could hijack a non-China affiliated prefix, use it for badness, and then release the hijacked prefix.
Most Geolocation services that are static (point in time) will not detect the above scenario. BGP-based monitoring services will, but that's a step up $$$ wise.
There's a tradeoff. No, geolocation isn't perfect, and it's often oversold, but it can be useful in a security context. Simple (admittedly reductionist) example: say I have no admin-types in China, and don't expect to. It's pretty simple operationally to grab the 'China range', block port 22/tcp (or whatever the hackers are after today), reduce my risk surface area by a billion IPs or so, get that noise out of the logs (maybe collect statistics on the rule for trending/anomalies), and then have more bandwidth to spot the edge cases where a hijacked block is coming after me. Far from a 100% solution, but maybe a 90% solution. Another tool in the toolbag. Your risk model may vary.
Be careful with that logic: While prefixes are supposed to be regionally-allocated, they can and DO move around occasionally. You could find yourself in a situation where you are referencing outdated geolocation information, resulting in a potential violation of GDPR.
Example, if your app was on Heroku behind Cloudflare, the request will look like this:
IP: <Heroku's load balancer addr>
X-Forwarded-For: <Real user addr>, <Cloudflare addr>
Your code, as written, will be geolocating the Cloudflare node.