Having coded up a semisolid implementation of time-dependent contraction hierarchies (which might not even get used, because of the very same data accessibility issues you outline), it seems at first glance quite straightforward to adapt CH preprocessing to this model. To contract a vertex you just look at whether each potential witness path has a nonzero probability of being a shortcut. I suppose you could cut preprocessing time down by replacing "nonzero probability" with "probability greater than epsilon" to get a sort of approximate hierarchy.
You refer to the edge weights as an "hmm", which I guess stands for hidden markov model. It doesn't seem like they use that term in the paper, although if you made the data time-dependent would edge weights then be a genuine markov process? Would something like the Viterbi algorithm become a useful subprocess of the overall algorithm in that case?
The current big open source routing engines tend to be written in c++ or java. Do you think using python makes a significant difference in query or preprocessing times? *
*Edit: Just noticed the Numba dependency, which probably answers this question.
Glad to hear from someone who's worked on routing!
Yes, it's quite possible that CH can be adapted to this -- I haven't had a chance to attempt it -- although it's not obvious to me if it would help with the preprocessing time.
I have not been very interested in approximations myself, since coming up with reasonable approximations seems rather obvious/straightforward, and yet producing any bounds on the error seems to be quite the opposite (at least for me)! That said, it's only a matter of time before someone does a comprehensive empirical study on that. :-)
The HMM was actually beyond the scope of this project, which is why you don't see any mentions of it. (The reference to check out regarding the path HMM inference would be #23 on Springer.) Its only significance is that it was the model of the dataset we used, but as far as we were concerned, we merely had a GMM.
Regarding time-dependent travel times, I'm not actually sure what the best model for that is (not just from a modeling perspective or data gathering, but also storage constraints and ease of inference); it's something I've been thinking of looking into, and I think others are as well. It does seem reasonable that a Markov model might work, since the travel time on a given edge at a subsequent time step only depends on the travel times on that edge and/or on incoming edges at a previous time step. However, I don't immediately see a reason for it to be Markov in solely the time dimension, which (if it's not) means that HMM algorithms such as Viterbi might not really be meaningful here (?), since the Markov properties I can see require a space dimension as well. It seems like a more general (but sparse & mostly-planar) Bayesian network to me. I could be wrong about this though.
Regarding Python vs. C++/Java -- it does make a significant difference, but not outrageously. I'm guessing you didn't see this part on the GitHub page, but it actually turns out this code is "only" 3× slower than the C++ equivalent used for the paper. (Note that this comparison is after a fairly heavy amount of optimization on both the C++ and Python codebases. Also note that the Python optimization involved far more than using Numba, although that was a notable part of it.) I think it makes a bigger difference in query times rather than preprocessing times, since the latter is far easier to parallelize (meaning you can just make it faster by throwing more hardware at it rather than by rewriting the code). But the overall problem is so computationally intensive that the main problem isn't the implementation language, it's coming up with an appropriate algorithm so that you don't need an astronomical number of CPU-days.
Thanks for the answers. I see what you're saying about the Markov properties - certainly something to think about.
> I haven't had a chance to attempt it -- although it's not obvious to me if it would help with the preprocessing time.
My initial research suggested that CH has much faster query and preprocessing times (and lower memory overhead) than arc-flags, which is why I ended up implementing it. Plus you can even combine CH with arc-flags or ALT to get pointlessly good speed. However I don't know how much faster this "arc-potentials" is than arc-flags.
Using "approximate" contraction hierarchies doesn't necessarily lead to inexact results! There's a really interesting paper by the Karlsruhe guys [0] which discusses how you can set up a contraction hierarchy where the shortcuts only store their minimum and maximum travel times. At query-time this allows you to quickly generate a "time-profile corridor" between points A and B, which is a subgraph containing only those vertices which lie on a shortest path from A to B at some departure time. Then you only need to do a quick time-dependent dijkstra on the corridor ;]
Interesting! Yeah, I'm not terribly familiar with CH. Generally speaking, the trouble with the stochastic setting (perhaps you're already aware of this) is that it doesn't satisfy a very tempting (!) notion of sub-problem optimality that holds trivially in the deterministic setting. We used to have a simple counterexample in the paper but I think we might have had to remove it due to space constraints (here [1] is a copy). So I'd have to really think about what you're saying; it's not obvious to me that it should work in the stochastic setting, but perhaps it does. But thanks for the pointers either way, that's useful to know!
Ah, good question, no I haven't. I'm basing that comment on a vague memory of a graph showing query time vs preprocessing time for lots of algorithms. Looking back at the graph it seems ch with arc flags (chase) slightly improves query time, but for alt they only have results on non fully contracted networks (core-alt). Probably a memory issue.
Sorry if I misled - I haven't actually tried combining ch with alt as it turned out to be more than enough on its own. My point was just that ch is reported to synergise well with other methods, which makes it likely to be a good investment.
Interesting, we (GraphHopper) tested CH+ALT and saw also no improvements, also for CH with normal A*, although it is reported in papers that it should be faster. The idea about non-fully contracted networks is really nice and could be worth a try - thanks!
Wow, that seems counter-intuitive. I suppose it must be due to the computation of the heuristic, combined with how few relaxations are needed in CH.
GraphHopper is really nice work by the way. I believe you are still working on implementing stall-on-demand? Will be very interesting to see how much difference that makes on long-distance queries.
I guess it is the overhead of the heuristic that is too much compared with the saved visited nodes, but this is counter-intuitive for me as well and if I have more time will investigate this again.
> GraphHopper is really nice work by the way.
Thanks!
> I believe you are still working on implementing stall-on-demand?
I don't believe any apps utilize these algorithms, and for good reason. The main problem is that SOTA is a very computationally intensive problem. On the example shown, a plain Dijkstra takes a fraction of a second (you can imagine A* is faster, and over time these have been sped up by a factor of millions, IIRC), whereas SOTA takes dozens of seconds to run. The difference increases quickly for larger networks and time budgets.
If you check out the paper I linked to, you might get a better idea of just how computationally intensive this process is. If I recall correctly, classical point-to-point pathfinding algorithms have even advanced to the stage that, with some reasonable (albeit clever) preprocessing, queries on very large networks can be done in a matter of milliseconds on mobile CPU power. Compare that to this algorithm, which takes dozens of seconds to run on a network the size of San Francisco, or at best in dozens or hundreds of milliseconds after practically prohibitive preprocesssing (e.g. 30 CPU-days and tens or hundreds of gigabytes are pretty normal) on a Core i7 CPU. It's just not scalable yet, which is why we've been working on it as a research problem (as have many other people).
Edit: There are lots of obstacles; the above was just one. Another one is that even the existing model assumes the same travel time distribution throughout the day, which is clearly unrealistic (you can imagine fixing this would blow up the running time even more). Yet another one is the problem of getting enough traffic data to construct useful travel time distributions, which means only a major company with existing data (like Google) could release an app that actually initially works with real data. Barring some kind of contract with a big company, everyone else would have to just release an app to gather data, and it would only become useful after a long time (if people are even willing to give away precise location data to random app creators).
I didn't specifically measure memory usage, however it's on the order of a few hundred megabytes per query (600 MB on the particular example I'm looking at).
Regarding the license and maintenance, it's what seemed the most suitable for me given the effort I'd put in and what the code actually did. But the idea was that if it's unsuitable for you, you can certainly contact me and make a request for them to be different, and I can try to license it differently on an individual basis.
Is the main RAM usage from the graph or is the algorithm RAM intense?
I'm asking as other open source projects (like ours, see my profile) might not look in your code just because of the license. And yes, that is the tricky thing about open source: accept that other people make money with it and try to do this too ;)
> Is the main RAM usage from the graph or is the algorithm RAM intense?
The graph itself is negligible, unless you want to store lots of auxiliary data (street addresses and whatnot), but then those are already on the disk, and you don't need them in memory for this problem.
The traffic data is somewhat bigger, e.g. a few hundred float64's per edge, so like 100 MB for |E| = 40k edges would be normal (no pun intended) but that's (a) just something that depends on the sizes of your distributions and not the algorithm, and (b) something that you can discard and re-generate on the fly from the continuous-time distribution parameters, at a speed cost.
The main memory usage is for storing the policy solution itself. For a discrete budget T (say, 5k steps) and |V| vertices (say, 10k) you need approximately T * |V| float64's just to store the policy solution (e.g. 200 MB).
And if you want the convolutions to be done in guaranteed quasilinear time, you can pay a penalty of T * |E|, which would be e.g. 800 MB (4 roads per intersection). This is not necessarily necessary though, it just depends on the problem parameters. Sometimes naive convolution is faster too. (Actually, I just lied a bit, because for every edge you don't need a vector of size T, but a smaller vector of size (T - min_travel_time_to_destination_from_edge_source), so what I gave you is somewhat of an overestimate. But that's the general ballpark.)
For computing the path, an algorithm that doesn't need to pre-compute a policy may (or may not) use less memory, so for that problem, the policy's memory usage can be ascribed to the algorithm. However, I don't really expect the memory usage of a different algorithm for the same exact problem to be radically different. (You can of course make approximations, etc. but then it's not the same problem or necessarily the same solution.)
> I'm asking as other open source projects (like ours, see my profile) might not look in your code just because of the license. And yes, that is the tricky thing about open source: accept that other people make money with it and try to do this too ;)
(Edit: I edited this comment since I realized I initially misunderstood your licensing model.)
I think a few key difference between our projects (and therefore licensing models) are:
1. Just how much more computationally intensive this problem is compared to standard routing. For me to adopt your model of providing a paid service while making the code free for commercial use, I would not only have to spend a significant amount of time writing service APIs for this (I have other things to do than to start this business!), but I'd have to invest in quite a large cluster of servers -- 1000 CPUs per metropolitan road network seems to be the minimum for this problem to get decent query times. (Again, note that a reasonable query on a sub-portion of San Francisco takes 15 seconds, and this is already making the unrealistic assumption that traffic is constant throughout the day.) Both of those just seem unrealistic for me, so I'm just saying if someone else actually thinks such a business would be profitable and would like to take up those tasks, they can request a commercial license.
I'm not sure anyone should use this commercially, though -- see the next point.
2. I simply have not designed or even intended this project for production usage, whether paid or free... and the license is meant to emphasize that. It's for researchers, meaning it's meant to be short & easy to tweak. It's not particularly well-written code, there is no clear API at all, there are many "engineering" tricks/approximations that would need to be put in for actual real-world usage, etc... and to be honest, I actually don't want poorly-written code to go into production use! I think that is one of the things that is actually harming us in the modern era. So I would rather that anyone who wants to use the algorithm just rewrites their own code in a better way for production usage, and not even use my code. Honestly, they would probably have to understand the code well enough to be able to rewrite it in a breeze if they want to modify it enough to make it useful.
So, basically, the commercial-license clause was the "else" clause in the license. :-) Like I wrote there, I didn't necessarily think this clause should ever be hit. Chances are good that even if someone contacts me, they'll change their mind and decide it's outright easier in anything but the very-short-term for them to just write their own code based on the paper (and that this would be more profitable would just be icing on the cake). But in case for some reason they don't think that, I didn't want to leave any doors closed, so I included that option.
### Dependencies
- NumPy is the only hard external dependency.
- Numba, if available, is used for compiling Python to native code (≈ 3× speedup).
- PyGLET, if available, is used for rendering the results on the screen via OpenGL.
- SciPy, if available, is used for slightly faster FFTs.
Having coded up a semisolid implementation of time-dependent contraction hierarchies (which might not even get used, because of the very same data accessibility issues you outline), it seems at first glance quite straightforward to adapt CH preprocessing to this model. To contract a vertex you just look at whether each potential witness path has a nonzero probability of being a shortcut. I suppose you could cut preprocessing time down by replacing "nonzero probability" with "probability greater than epsilon" to get a sort of approximate hierarchy.
You refer to the edge weights as an "hmm", which I guess stands for hidden markov model. It doesn't seem like they use that term in the paper, although if you made the data time-dependent would edge weights then be a genuine markov process? Would something like the Viterbi algorithm become a useful subprocess of the overall algorithm in that case?
The current big open source routing engines tend to be written in c++ or java. Do you think using python makes a significant difference in query or preprocessing times? *
*Edit: Just noticed the Numba dependency, which probably answers this question.