Hacker Newsnew | past | comments | ask | show | jobs | submit | timita's commentslogin

A binning method that is fast and does not suffer from the hierarchical issue you mention: quadtiles.

Looks like there is an implementation in Go [0]. I haven't tried the kind of geofencing you mention in memory, but I've had success with PostGIS.

[0] https://github.com/volkerp/goquadtree

Edit: that repo seems to be very old. Quadtiles have been implemented successfully in a variety of languages, you should be able to find something more suitable.


oh okay thank you, will surely check it out


i'm also thinking the POSTGIS route tbh, but a very good alternative i have found is in memory geo querying with redis geo spatial, since this is for more real time notifications - https://redis.io/docs/latest/develop/interact/search-and-que... what do you think about this.


> Meteor is essentially dead today

Care to explain what you mean by "dead"? Just today v3.2 came out, and the company, the community, and their paid-for hosting service seem pretty alive to me.


If you are based in the UK, there is https://xploria.co.uk. Click/tap on any location on the map and you get a lot of information about the place, inclusive of generations, what percentage of families, singles, etc, schools within travel timw, and so on. Disclosure: I built this web app.


This is very cool - I haven't seen anything like it before. I've spent about half an hour noseying at places I've lived or considered living, fascinating. The view of changing house prices is something that I haven't seen presented in this way either.

Thanks for building it and thanks for sharing.


Thanks for the kind words! The prototype was launched in 2013, you'd think the world would have caught up since :D. The platform underneath allows far more advanced functionality, so stick around, there will be many more usable (and useful) features in the near term.


Is this based on census data?


Yes, although not yet updated to the one of 2021 (2022 in Scotland).

The main page has references to all data sources.


> May I ask what a non-technical founder should look for in a technical co-founder?

Breadth of expertise, ability to learn quickly, open minded when it comes to technology. Loves to work with clients and solve their problems more than they love the tech.

The B2B and PostGIS part caught my eye. I'm technical, but leading both the business and the technical sides of my spatial analytics company. I would be happy to share at least some of my technical knowledge. From infrastructure, DevOps, through database and data layer, and all the way to the frontend - I've been doing this for years and there could be a thing or two that may be useful to you.

On the other hand, I can always do better when it comes to B2B sales, so I am looking to learn too. Maybe we can have a chat sometimes.

Would like to connect?


> Am I supposed to use... excel to do that?

This is gratuitous. You have a clear bias, granted, because it seems your domain is so specific, only a procedural language will do. But it seems you are unfamiliar with modern SQL tools. Some of the obvious ones that come to mind: Metabase[0] for visualisation or Apache MADlib[1] for in-database statistics and machine learning.

[0] https://github.com/metabase/metabase [1] http://madlib.apache.org/


I humbly disagree. Its not that ''my domain is so specific'', but that data analysis in itself is a discovery process which is not straightforward. If you want to explore the jungle, you need a machete.

As for Metabase of MADlib, you are right ; I was not aware of these new tools. They look great, and I'm certain they can help a lot of people. However you assume that they are available! Not all IT departments are open to the idea of buying new software, and if the suggestion comes from an outsider it will be perceived as an insult (been there, done that many many time). And when they refuse, now what? You go back to the usual procedural languages (R, Python, Julia, etc) which are free and don't require the perpetual oversight of some DBA who thinks that all you need is an AVG(X) and GROUP BY since kurtosis is domain specific anyway.

I've meet some Excel-VBA users who couldn't care less about pro devs or decorators since ''they can already do everything by themselves''. Same thing with the SQL only, Python only, Tableau only or wathever-only crowd.


I usually use Python and R for analysis. However, when dealing with larger datasets, e.g., 0.5 - 2 PB, I have to rely on SQL/BigQuery because I can't get Python and R to deal such workloads in reasonable time. I tried Dask, but I couldn't resolve a few bugs it had at the time.

If you were to find outliers in a 1 PB table, what tools would you use?


Which is absolutely fine in a non-critical scenario such as picking litter, or finding your mates at a festival. But, as the article demonstrates in detail, there are too many possibilities for error, which in a life or death scenario we cannot afford. Emergency services often deal with exactly that kind of scenario.


The guys at OpenCage Geocoder[0] are doing a great job using only open data. But they are a team with over a decade of experience parsing an deduplicating address data from dozens of countries.

That said, their jobs page doesn't have much for now, but you may want to keep an eye on it.

Disclaimer: the founder of the company and I are acquaintances, but my assessment is only based on the quality of their service. I've been using it in production for reverse geocoding for a few years now.

[0] https://opencagedata.com/


Hi,

Ed from OpenCage here, thanks for the kinds words! It's true we don't have any open positions right now. But anyone who is into geo stuff in general and geocoding specifically can dive in to OpenStreetMap and the open source libraries we (and many others) rely on and contribute to. Most notably Nominatim https://nominatim.org

Here's a podcast interview I did last summer with Sarah Hoffmann, the lead maintainer. https://thegeomob.com/podcast/episode-35


Hi! Shame you're not hiring.

While your API service is stellar, if not the best with open data, unfortunately the data quality is always the limit and one can only extract so much from it. While I noticed a lot of sanitization when running some queries, it didn't take a long time to find hiccups, mainly because I know the types of warts open geo data has.

But from my quick tests, there are two issues.

1. Spain. Like, the whole of it. OSM Spain is lacking a lot of number information. Even Madrid (city) alone is missing a lot, and some reasonably large towns are basically unnumbered. E.g. 40.309452, -3.730451, the whole of Getafe (180k people) lacks numbering.

All that information is available in the catastro, but names are often shortened, missing prepositions, lacking accents ("Calle de la Pasión" becomes "CL PASION" in the catastro) and is a horrible mess overall with no 100% proof way to cross correlate data, but here I don't see any cross correlation happening at all.

2. Searching for "Place de Gaulle", because it's a solid no-strange-characters way to obtain an endless supply of points within France, shows a mysterious result at rank 10: 47.63341, -83.04979, in the middle of nowhere, ON, Canada. No info whatsoever. Why would that rank that high, vs thousands of French counterparts? It doesn't appear in Nominatim either, nor in any of the datasets I've worked with; not sure where that comes from. Now I am curious, what's that?


Hi,

thanks for the kind words.

You are right that a geocoder is only as good as the data available to it. Happily OSM is great for many use cases and getting better literally every day.

Whether it is good enough now for your use case will depend ... on your use case. Not everyone needs comprehensive house numbering in Getafe. Until the local OSM community decides to add those numbers we do the best we can for the use cases where open data is a viable option today. As an aside, I am not sure the catastro qualifies as "open" data (even if it may be public), and even if so, as you correctly note, someone with local familiarity for all the abbreviations and common usages will need to help with adding it. Local knowledge is key.

re: "Place de Gaulle", of the top of my head I couldn't say, I would have to a detailed look. It's complicated, which is what makes geo fun.


Catastro doesn't have a clear license, but the spirit is certainly open[0]:

"It's worth noting the mass download service of cadastral information, available since 2011, that makes it free for companies and individuals said information, including the possibility of it being reused."

Translation mine.

I'd love to hear about the origins of such mysterious Ontario spot!

[0] http://www.catastro.minhap.gob.es/esp/usos_utilidades.asp


Without having looked in detail I would guess this is a situation where no license just causes confusion. Now it's unclear what is allowed. Ideally they would be explicit about what is allowed. Anyway, if it is allowed, using that data is a decision for the local OSM community. If you live there or have a local connection, please get involved, or just with mapping generally. It's good fun. Here's a tutorial of how to add house numbers to OSM, really it is pretty simple:

https://opencagedata.com/tutorials/adding-an-address-to-open...

re: Ontario, I will eventually have a look, but the list of projects is long and priority goes to bugs reported by customers.


If you're ever hiring folks to work on open source geo stuff, please post them on FOSSjobs and the other aggregators linked from the wiki:

https://www.fossjobs.net/ https://github.com/fossjobs/fossjobs/wiki/resources


Well, maybe _you_ don't write software in R. Others can write a pretty capable server[0] using just that.

[0] https://github.com/opencpu/opencpu/tree/master/R


Of course you can write anything in R but that doesn't mean it will be possible to maintain the codebase for the next 10 maybe 20 years including dev team changes.

In other words R is wrong tool to use for building the software just from maintainability standpoint.


I agree that in general one would not use R to engineer software. However I disagree regarding low maintainability.

I've seen very complex R packages that have been around for 30+ years. They are actively maintained. Applying good design principles, unit testing, modularity - all that is very much doable in R.

And you will want to write software in R, especially if the purpose is to support the very activity of working with R packages and to expose the R statistical environment over HTTP. Case in point, the software I link to, in my comment above.


Not sure if 100% of their posts are mind blowing. It's hard to dispense much business wisdom via a Twitter feed. Still, you can learn a lot from them, especially if you follow the links they recommend. My favourite is at the top:

David S. Rose @davidsrose Venture capitalist, entrepreneur, angel investor

Jeff Bussgang @bussgang Former entrepreneur turned VC at Flybridge Capital

Fred Wilson @fredwilson I am a VC

I'd also suggest to read through David Rose's replies on Quora. Very thoughtful


I agree about @davidsrose -- he loves to teach and share knowledge.


Hi anish_t,

About six months ago I tested Kinvey (a BaaS) for an application that makes heavy use of data. Basically you upload your data (say, from CSV) and then you access it via API calls.

It looks like you can download your data at any time, say if you need to switch providers. I don't have any experience of other providers, but with Kinvey it doesn't look like there is a risk of vendor lock-in. It's data on the wire via AJAX requests. If things turn to be expensive when you grow up, you move the data to your own backend.

That said, I must add that I never ended up using them. The reason - I just needed something with serious geospatial capabilities. I mean PostgreSQL + PostGIS kind of thing.

There is also the DIY route, which is not necessarily the best, since it will prolong the time until you release a MVP. But, if you have the manpower to work on it, maybe Drupal is a good alternative for you. It can be set up as a backend, it is secure, and it's free. I think you can even set it up as a SPARQL endpoint.

Good luck with your startup! I don't wish you to reach millions of users, but instead to make millions of dollars :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: