Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Tiny, fast, and free API to geolocate IP addresses (github.com/risk3sixty-labs)
144 points by whatl3y on Dec 9, 2019 | hide | past | favorite | 45 comments


    const realClientIpAddress = (req.headers['x-forwarded-for'] || req.ip || "").split(',')
    const ip = realClientIpAddress[realClientIpAddress.length - 1]
X-Forwarded-For is appended-to for every proxy the request passes through. You want the first IP address, not the last one.

Example, if your app was on Heroku behind Cloudflare, the request will look like this:

IP: <Heroku's load balancer addr>

X-Forwarded-For: <Real user addr>, <Cloudflare addr>

Your code, as written, will be geolocating the Cloudflare node.


This is old but I believe still applies to Heroku: https://stackoverflow.com/a/18517550

So it’s the opposite of what the normal behavior is, meaning the real client IP is guaranteed to be the last in the list. I probably should have a condition to get the IP based on this logic only if the app is hosted in heroku, then use the standard express way otherwise.


> meaning the real client IP is guaranteed to be the last in the list.

The last entry in the list is simply the IP address of whatever is making the request to Heroku. In my example, that's Cloudflare, being a proxy. Heroku simply appends the originating IP address (coming from a proxy) to the header which is what you would expect.

The Stack Overflow answer is addressing X-Forwarded-For spoofing, something you don't care about for geoip lookup. Someone could prefix 8.8.8.8 to the header before making their request, thus "X-Forwarded-For: 8.8.8.8, <Real IP>, <Cloudflare IP>", and it's inconsequential that your service will return results for 8.8.8.8 instead of <Real IP>.

The SO answer is wrong that this is Heroku-specific behavior. Heroku is simply appending the originating IP address to the header.

Obviously you can push a route to production that logs req.headers to see for yourself to nip this in the bud.


If we extend the scenario from hombre_fatal's example to include IP spoofing like in that wikipedia article you linked, you end up with

IP: <Heroku's load balancer addr>

X-Forwarded-For: <Spoofed addr>, <Real user addr>, <Cloudflare addr>

So you either need to know the number of proxies that you trust that are in between you and the user, or you need to know the IP addresses of those trusted proxies, in order to determine which parts of X-Forwaded-For to trust.


I run https://freegeoip.app for some time now, if anyone is interested in a free hosted solution.


what does it cost you to keep this up and running?


The problem with geoip is that the free services will never be as good as the paid services, and the paid ones aren't all that accurate either.

For just about every geoip use case, there is a better solution. Namely, almost every modern phone and desktop is capable of providing it's location, and is more accurate than any geoip database.

The main issue is that people can make their device lie about it's location, so if you're using geoip for security (say you're a streaming service) then that's about the only valid use case, and that only exits because studios still want to live in a world where borders matter.


There are a lot of good applications for geo-ip tech. Banks probably want to flag if someone is trying to login a US-based account from China or using a known VPS/VPN provider as a proxy.

They don't need to be perfect and its certainly better than nothing.


>For just about every geoip use case, there is a better solution. Namely, almost every modern phone and desktop is capable of providing it's location, and is more accurate than any geoip database.

You need user permission to get that.


There are easier ways to use MaxMind data without injecting a third party.


FYI, they also provide a free non-commercial version (GeoLite2) for tagging IP location: https://dev.maxmind.com/geoip/geoip2/geolite2/


Unless I'm mistaken, it can be used for commercial uses as well, but only under the `Creative Commons Attribution-ShareAlike 4.0 International License`, which, for an API, might imply everyone using the API to also use the data obtained in accordance with that same license, but IANAL.


By third party, do you mean this github repo? This app is also just using the maxmind data, afaict and providing the info in that db over the web.


By third party I mean the API. You can easily embed the database using a library in every programming language I’ve ever seen. There is no reason for another API, and there is no reason for another risk factor by trusting that this new API is correct.


I see that the underlying IP to Geo data is consolidated by MaxMind. Where does MaxMind get this data? I wish this data was open sourced.


That’s why MaxMind charges for more accurate data. It requires actual effort...


Yes, which is the whole point of my question: how is this data compiled?

Just because it requires effort does not mean it can't be done in an open source way.


1) Agreements with ISPs [most accurate]

2) IP Spidering via traceroutes / RIPE/etc data.

3) Agreements with third parties that have IP/Address mapping due to data supplied from users. [least accurate]

That'd be my guess anyway.


They also have a form on their site to submit corrections. Maxmind's db is so prevalent now, I suspect (indirect) users submit all the info they need nowadays.


They are likely using various information, like addresses from RIPE etc.

But there's also a NSA patent on this topic, "Method for geolocating logical network addresses" (filed in 2000).

https://patents.google.com/patent/US6947978B2/en?oq=6%2c947%...


Oddly enough, that patent apparently expired today?


> Oddly enough, that patent apparently expired today?

Not anymore, maybe they are reading HN as well :-)

    2023-09-15 - Adjusted expiration
  > 2019-12-09 - Application status is Expired - Fee Related
    2005-09-20 - Publication of US6947978B2
    2005-09-20 - Application granted
    2002-07-04 - Publication of US20020087666A1
    2000-12-29 - Assigned to GOVERNMENT OF THE UNITED STATES, AS REPRESENTED BY DIR. NAT. SECURITY AGENCY, THE NSA GENERAL COUNSEL (IP&T)
    2000-12-29 - Priority to US09/752,898
    2000-12-29 - Application filed by National Security Agency


Can't edit, but the last item is always just its status at the current day. My bad.


Interesting, I thought government agencies couldn't get patents and trademarks because technically we own everything they do.


Country-level data can be obtained from the 5 RIRs freely and via a standardized CSV-like document updated every day. I wrote an article about that, the first part shows how to get and parse that: https://www.ecalamia.com/blog/make-your-own-geoip-api/


We made a similar app called geoip - https://git.cloudron.io/cloudron/geoip . It also uses maxmind's db. Supports json and jsonp as well. You can try it at https://geolocation.cloudron.io/json . Please don't use this as a 'service', install your own :)

BTW, do you use geolite or geolite2 db? The former is getting deprecated next month.


I started IPinfo.io ~6 years (and launched it on HN: https://news.ycombinator.com/item?id=7239333). We now serve 20 billion geolocation API requests a month, and roll our own geolocation data (we used to rely on the maxmind data, but have been busy working on improvements to that, and then our own complete data, along with other data sets like IP usage type, company and carrier etc).


So what does this have to do with OP's submission? You wrote a whole article about courtesy "guerrilla marketing" on Stack Overflow. At least comment on their work before your advertising.


GeoIP is pretty accurate at the state/country level for most users, but you will run into precision issues at the city level.

A bigger problem seems to be that many forget to continuously sync their IP DB with their provider. Your targeting is only as good as your IP -> Geo map.

My team built a tool for testing GeoIP implementations here: https://www.geoscreenshot.com to get around the issue of testing if it works.


That depends on your use case. Huge numbers of people (e.g. people at work) use VPNs, and their 'geolocation' could be wildly different than their actual location. If you're an IBM employee (what...quarter of a million people) on the VPN, you look like you're in New York someplace. At my current employer (80k), most of us look like we're in Minneapolis, even though I'm half a continent away. If you're, say, targeting ads based on city/state level GeoIP, that's a lot of misdirected ads.


VPN users are an exception. I would think it would be best to use a proximate node to reduce latency.

For corporation, there is another form of targeting (account based targeting) that relies on IP ranges. I believe DemandBase covers this specific use case.


If you're interested in a service with a free tier and more specific granularity than the MaxMind free database, Geocodio has a pretty nice service. They also have a bunch of different enrichment options that you can tack on if you need things like congressional districts or school districts. It's a really nice service.

https://www.geocod.io/

(not affiliated, just a fan)


geolocation != geocoding. The first converts IP addresses, the second postal addresses


Oops! Thanks.


A similar service is Am I Mullvad’s API. Not sure if they use MaxMind, however.

https://am.i.mullvad.net/api


This is pretty awesome. Might have to use this for various purposes


You are a hero


I always feel a need to state this to folks who are not aware of geolocation and ip addresses: Geolocation based on IP is very unreliable and should be used only for soft-analytics at best.

Example: It is not fit for security postures (in theory). One can dump all the CURRENT v4 routes being advertised out of China and block them via blackholes/firewalls/etc. However immediately after that a rogue operator could hijack a non-China affiliated prefix, use it for badness, and then release the hijacked prefix.

Most Geolocation services that are static (point in time) will not detect the above scenario. BGP-based monitoring services will, but that's a step up $$$ wise.


Yes, but...

There's a tradeoff. No, geolocation isn't perfect, and it's often oversold, but it can be useful in a security context. Simple (admittedly reductionist) example: say I have no admin-types in China, and don't expect to. It's pretty simple operationally to grab the 'China range', block port 22/tcp (or whatever the hackers are after today), reduce my risk surface area by a billion IPs or so, get that noise out of the logs (maybe collect statistics on the rule for trending/anomalies), and then have more bandwidth to spot the edge cases where a hijacked block is coming after me. Far from a 100% solution, but maybe a 90% solution. Another tool in the toolbag. Your risk model may vary.


What would you recommend for legally mandated country-of-origin filters?


I've found it useful to see what country someone is in and know if you should be taking GDPR measures for the user.


Maybe you should treat all your users' privacy and data with proper respect, regardless of what country you think they're in.


What if your idea of proper is different then that countries laws?


This is the correct approach.


Be careful with that logic: While prefixes are supposed to be regionally-allocated, they can and DO move around occasionally. You could find yourself in a situation where you are referencing outdated geolocation information, resulting in a potential violation of GDPR.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: