There were a few things I thought were a bit odd about the architecture, as I recall it.
IIRC you poll Graphite for metrics. Why not push them from StatsD directly into Skyline? This would probably be more efficient. If you used incremental / online / streaming algorithms you'll have a compact summary at each time step, so you can throw away the raw data. 250K metrics would fit in memory quite easily (we're just talking approximately a number and a string each, right?) and you have 4000+ cycles per second to process them, which should be sufficient.
Python lack of good threading would possibly be a problem. I would use the JVM (Scala in my case). Apache Commons Math is pretty good (http://commons.apache.org/proper/commons-math/). Java's verbose interfaces are a bit annoying, but the JVM is damn efficient, and you can wrap the crap is something more aesthetic. It's a solid choice no matter what the hipsters say. ;-)
Ah! We could, but StatsD provides support for more complex metrics like aggregated sums over time. Not something that lends itself easily to a discrete datapoint.
It is. But we use multiprocessing, which is basically the same API. Still, you can't beat the awesome Python stats libraries: Numpy, SciPy, Statsmodels, Pandas..
There were a few things I thought were a bit odd about the architecture, as I recall it.
IIRC you poll Graphite for metrics. Why not push them from StatsD directly into Skyline? This would probably be more efficient. If you used incremental / online / streaming algorithms you'll have a compact summary at each time step, so you can throw away the raw data. 250K metrics would fit in memory quite easily (we're just talking approximately a number and a string each, right?) and you have 4000+ cycles per second to process them, which should be sufficient.
Python lack of good threading would possibly be a problem. I would use the JVM (Scala in my case). Apache Commons Math is pretty good (http://commons.apache.org/proper/commons-math/). Java's verbose interfaces are a bit annoying, but the JVM is damn efficient, and you can wrap the crap is something more aesthetic. It's a solid choice no matter what the hipsters say. ;-)