Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks for replying! A little while ago I watched part of your Bacon Conf talk (http://devslovebacon.com/conferences/bacon-2013/talks/bring-...) and read the slides.

There were a few things I thought were a bit odd about the architecture, as I recall it.

IIRC you poll Graphite for metrics. Why not push them from StatsD directly into Skyline? This would probably be more efficient. If you used incremental / online / streaming algorithms you'll have a compact summary at each time step, so you can throw away the raw data. 250K metrics would fit in memory quite easily (we're just talking approximately a number and a string each, right?) and you have 4000+ cycles per second to process them, which should be sufficient.

Python lack of good threading would possibly be a problem. I would use the JVM (Scala in my case). Apache Commons Math is pretty good (http://commons.apache.org/proper/commons-math/). Java's verbose interfaces are a bit annoying, but the JVM is damn efficient, and you can wrap the crap is something more aesthetic. It's a solid choice no matter what the hipsters say. ;-)



Ah! We could, but StatsD provides support for more complex metrics like aggregated sums over time. Not something that lends itself easily to a discrete datapoint.

It is. But we use multiprocessing, which is basically the same API. Still, you can't beat the awesome Python stats libraries: Numpy, SciPy, Statsmodels, Pandas..




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: