Show HN: StackImpact – Python Production Profiler: CPU, Memory, Exceptions

galonk · on June 27, 2017

I got excited but then read that it's some kind of cloud-based web application thing.

Is there something like this (show memory use and call times for a Python process) that just runs on my computer to help me profile a long-running Python process?

alex- · on June 27, 2017

psutil (https://pythonhosted.org/psutil/) is awesome for collecting valuable monitoring information.

pyrasite (http://pyrasite.com/) will let you inject code into a process. This can be used to add monitoring of private internal state etc (if you have no other options).

If you want to have locally hosted graphs then grafana and influx are my current tools of choice.

It is going to be more work than swiping a credit card, but not a crazy amount.

willangley · on June 28, 2017

pyrasite can make the child process slow down markedly; this came up for me when injecting profilers into cinnamon-screensaver to try to root cause a memory leak (https://bugs.launchpad.net/linuxmint/+bug/1652489)

cinnamon-screensaver would take multiple seconds to lock the screen even after I'd stopped profiling and exited the interpreter I'd injected, and I wound up restarting it so I could lock my screen quickly again.

I don't know why this happened, but it's enough to make me think twice, and I'm definitely going to double-check my process is still performing as I hope after injecting it with pyrasite in the future.

sciurus · on June 27, 2017

Sure, there are plenty of deterministic and statistical profilers for python. Two examples:

https://docs.python.org/3/library/profile.html

https://vmprof.readthedocs.io/en/latest/

rollcat · on June 27, 2017

Yes! It's called profiling and there are many ways to do that. Python has built-in profiling tools (profile, cProfile), there are also whole-system profiling solutions like DTrace (sadly, that is not available on Linux). No fancy GUIs AFAIK, you'd have to RTFM a bit.

viraptor · on June 28, 2017

> DTrace (sadly, that is not available on Linux)

Sysdig is available though. It even got userspace tracers recently (including Python). Alternatively there's "perf" if you want just the kernel side.

fnl · on June 28, 2017

As nobody yet has mentioned it: lptrace -> strace for Python programs @ https://github.com/khamidou/lptrace

For Python3: https://github.com/rixx/lptrace/tree/python3

pietjepuk88 · on June 28, 2017

I also needed to profile a running process without stopping it, and ended up using pyflame (https://github.com/uber/pyflame) together with the famous flamegraph script. It is not perfect, as it only works on Linux and only does CPU profiling, but it worked well enough for my purposes.

marmaduke · on June 27, 2017

Yappi is a really useful one, especially for multithreaded apps because it can started and stopped at any time regardless of control flow. For example, you can setup a rest API just to handle profiling and then query it at any point during app lifetime.

metalliqaz · on June 27, 2017

This is an interesting project. It appears to me, based on the README and the name, that it was primarily intended to profile backend services in a web app.

I have a question for hacker news. Does Python still have a lot of momentum in this area? I love Python and use it whenever I can, but I find these days that most web frameworks assume right off the bat that you are using Node. The frontend landscape is so heavily tilted towards using js (and tools such as npm, etc) on the backend that fitting it into a Python flow is difficult, especially for beginners. In addition, we have the relatively fresh trend of using isomorphic code on both Node and the client. It seems like my beloved Python is being pushed to the background. Is there any truth to this? I would very much like to keep investing my energy into what I know, but if it is wasted effort, I will stop.

danpalmer · on June 27, 2017

Python is huge on the backend, from startups to large companies.

While I've used Node, and I think it has its place in certain kinds of application, I would never pick it for the main backend language. Adding a chat service to an existing larger codebase, maybe, but JS isn't particularly well suited to larger applications (yet, it's improving) and the frameworks are still less mature than things like Django and Rails.

jaymzcampbell · on June 27, 2017

The front end might very well be heavily node influenced right now with React et al but I'm personally still seeing a lot of backend development being done in Python or Go. No doubt there's been a huge surge in demand due to React in particular but in many of the postings I've seen they seem to be talking to backends in Python/Java/C#. The contracting/getting paid for it landscape is still looking very bright for a lot of tech that doesn't have the buzz of node*.

stuaxo · on June 27, 2017

There are still lots of projects that use python on the backend. It seems like there are just as many jobs as there have been.

I'd say python is slowly growing if anything. The javascript world is probably growing faster, but then you are competing in a much bigger field.

cookiecaper · on June 27, 2017

IMO having some Node.js experience and React experience are very helpful to get hired into any position that is remotely near the web these days. I have seen several of my clients introduce Node.js and of course everyone is introducing SPAs in some form or another, mostly React. Mozilla is even integrating React into Firefox's UI.

I am personally skeptical of these trends, but my skepticism doesn't change the shift of the industry. My advice would be to get some Node and React experience under your belt so that you can at least discuss it intelligently, and it shouldn't be too much of an impediment moving forward.

Python is in a tough spot for growth, IMO. The new generation of languages have internalized much of what made Python great, while leaving behind a lot of the inadequacies and cruft attached to CPython.

Like you, I will always have a soft spot for Python, but it's getting increasingly difficult to continue to see it as the default choice for new projects (outside of a few specific niches).

metalliqaz · on June 28, 2017

Thanks, good stuff to think about.

Do you think React is something to learn together with Node or would you recommend just getting to know React on its own?

cookiecaper · on June 28, 2017

Bolting React onto an existing codebase is probably best because it is more similar to what you'd see in the real world. Most people don't start a React/Node thing at the same time; they'll start integrating React into their frontend early because they can get little React-compatible widgets plugged in more easily than they can introduce backend changes like finding opportunities for Node.

React and Node are not really related other than they're both JavaScript-based, so the skills aren't really co-dependent. Node is used to execute JavaScript locally (in build tools like Webpack, for example), but beyond a small amount of local scripting for builds, they don't really touch (afaik; I have not yet completed a major project with either of them, just used them here and there).

icebraining · on June 28, 2017

Which new languages are you referring to?

metalliqaz · on June 28, 2017

I'm willing to bet that one of the ones he was thinking of is Go

Dowwie · on June 27, 2017

That's one hell of an assumption you're making about web development. Node became popular only within the last few years.

lolsal · on June 27, 2017

There is tons of python work available (I weighed the unwritten src myself). Maybe the node/js news headlines are creating a recency bias?

metalliqaz · on June 27, 2017

Sure, this is possible, but I'm not basing my concerns on headlines so much, but rather what I'm seeing out there when I try to educate myself on what is going on in the frontend world. Once you get past basic JS and JQuery, everything seems to just take for granted you are using a toolchain that doesn't leave much room for Python.

bpicolo · on June 27, 2017

All of the skeletons/starting points out there pretty much ignore the fact that backends exist in general (by making demos just github stargazing or whatever, or using firebase/<insert graphqlaas> and such). It's definitely not the case that you need explicit nodejs on the backend, and that's not the common case for a lot of large companies. There are tons of options.

Render services [1], Sidecar processes [2], not-doing-universal-rendering, or just running a simple universal nodejs server with an entirely separate API backend.

[1]: https://github.com/airbnb/hypernova [2]: https://blogs.msdn.microsoft.com/webdev/2017/02/14/building-...

Integration is not typically that difficult. It takes a day or two of sitting down with docs and intentional effort, sure.

Worrying about wasted effort is silly though. The amount of choice these days is crazy, and web development lately is mostly hype-driven, though it doesn't really need to be. We end up solving the same problems over and over again with a slightly different set of technologies (which has it's good notes and it's bad notes).

For many companies the backend is several orders of magnitude larger than the frontend (hundreds of separate backend services/daemons/etc in any variety of languages), so optimizing the whole stack for the sake of the frontend would be nuts

LaurensBER · on June 27, 2017

It isn't to difficult to integrate Webpack with Django. I can't speak for other JS frontend tools but I can't imagine that it's very hard to do so.

I have no experience with server side rendering though but I can't imagine any problems with that either.

mrweasel · on June 27, 2017

I think the languages people encounter are highly influenced by their interests. I see a ton of projects being rejected because they are Node based an no one wants to deal with Node, npm and Javascript if a Python, Perl or Bash alternative is available. Still Node is hugely popular, just not in the areas I work in.

LaurensBER · on June 27, 2017

Seems really interesting but a lot of companies are not willing or not able to send this to an unknown un trusted party.

Would it be possible to host this on premise?

Sentry seems to do quite well with a business model where customers are free to host it on premise. That might be worth a consideration.

I for one am interested but for me to become a customer I would first need to be able to trail it on my staging environment. Providing a docker container that I can host on premise would go a long way towards being able to do that.

dmitrim · on June 27, 2017

There is no on-prem offering yet, since there was actually no demand/requests. At least with the Golang agent, which was introduced first. With Python agent we will reprioritise it. Thank you for the feedback! (Disclaimer: I work at StackImpact)

drcongo · on June 27, 2017

Also, it feels a little light on documentation and functionality for the Python agent considering it's the same cost as using the Go agent. It's hard to tell if I'm actually going to learn anything about what's causing the memory leaks in my app as a lot of the functionality seems to be Go only.

drcongo · on June 27, 2017

I have a badly behaving Flask app at the moment, so trying it out. As a heads up, some of the links don't work in the table of contents on your documentation page - notably everything indented under "Getting started with Python profiling".

eddd · on June 27, 2017

I created a very similar project: https://github.com/fieldaware/liveprofiler a few months ago.

Now, I am surprised I didn't push it forward.

rcarmo · on June 27, 2017

This interests me a lot because I'm using Azure App Insights (full disclosure: I work at Microsoft) after a couple of years of New Relic and I'm constantly looking for better takes on the "let's instrument this code and profile it remotely" thing, especially around gevent and asyncio (which have their own little challenges).

I've been thinking about building my own using Prometheus as collector/visualiser. Time hasn't been on my side, but eventually...

j_s · on June 27, 2017

Is there a clear leader in the commercial space? It sounds like New Relic is the most frequently mentioned, maybe I'm asking who's #2?

I am particularly interested in who best supports .NET on Windows.

ideaoverload · on June 27, 2017

Try Dynatrace. Disclaimer: I used to work for them.

X-Istence · on June 27, 2017

AppDynamics perhaps?

Sean1708 · on June 27, 2017

> The agent overhead is measured to be less than 1% for applications under high load.

Do you have the methodology and data that you used to obtain this figure? Because to be honest I'm quite dubious, especially for an app which is CPU bound.

dmitrim · on June 27, 2017

We are measuring both, individual profiler overhead when active (printed by the agent in debug mode) and total CPU and memory overhead of the app running over long periods of time with and without agent.

orf · on June 27, 2017

Are these apps under load? Is there really only a 1% difference when running apache-bench or seige on the applications?

dmitrim · on June 27, 2017

Yes, the apps were under simulated CPU load, memory allocations, etc. The good thing with sampling profilers is that overhead stays relatively stable even under high load.

rburhum · on June 27, 2017

Finally a NewRelic competitor... Their prices are killing me at scale

sciurus · on June 27, 2017

There are plenty of New Relic competitors. Datadog and AppDynamics both have APM products that support python, for example.

The feature set between this and New Relic is quite different. To oversimplify, New Relic works at the python library level, and StackImpact works at the python interpreter level. The functionality is potentially complementary.

smcleod · on July 1, 2017

Not to mention their laggy as anything JavaScript website, I love what you can get out of new relic if you try but you have to pay a LOT of money and you have to accept that their product has become slower and less user friendly over the past three years.

mooneater · on June 27, 2017

I've been trying ways to profile my django code on my dev server. Its using runserver and postgres on virtualbox (ubuntu in ubuntu) and takes 20s to display a page. This is not due to slow db queries, those are quick. strace says its making a huge number of calls to: futex(0xe9d550, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 I tried debug_toolbar, Silk, yet-another-django-profiler, these dont give me insight into where all that time is going and where those mutex calls are coming from.

Would this help? Any other suggestions?

Edit: Exact same code is hugely faster on a webserver in production. And its not the Vbox specs, i gave it lots of RAM and 4 CPUs.

jchw · on June 27, 2017

My guess would be the filesystem, especially if you're using Vbox Shared Folders (either directly or through Vagrant.)

If you're on Mac or Linux, you can massively reduce the amount of filesystem overhead by using Docker (or Docker Compose) for local testing, since on Linux it'll get direct access to the FS and on Mac it will use the special osxfs driver. You can also try using nfs to mount your drives instead of vbox shared folders if you want a quick gain, but it will make hot reloading even less reliable.

You may also want to be sure your settings are really the same. Does DEBUG=0 change anything? What cache backend are you using? Etc.

Finally, if none of the above helps you can try a move of desperation: try to get the app working on your native OS with no container or VM layer.

mooneater · on June 27, 2017

Wow your suggestion really helped: Debug toolbar itself was causing this for some reason. How embarrassing. Thanks!

jchw · on June 27, 2017

That is rather surprising, because I don't have the same issue with a pretty large production app. I wonder if there's a deeper underlying issue.

mooneater · on June 28, 2017

Ok I upgraded toolbar from 1.7 to 1.8 and problem is gone :)

There are a number of toolbar bugs at https://github.com/jazzband/django-debug-toolbar/issues/943 involving keywords "slow" and "hang", not sure which it was.

But the meta-issue is, the profiling tools I used were not super helpful in discovering what the issue was.

cuu508 · on June 27, 2017

runserver as in "manage.py runserver"? An issue I have hit a few times is when the app is making a request to itself, but runserver cannot serve requests concurrently. Say, browser loads request A and waits for response. While processing request A, the python code makes an internal request B and waits for response. This deadlocks because runserver will not start processing B before A has finished. Eventually request A times out, and if it was just some tracking call, the page appears to load, just after a long pause.

mooneater · on June 28, 2017

Interesting. Yes "manage.py runserver". In my case, I have been loading one page at a time using runserver.

trashcan · on June 27, 2017

Have you tried the low tech way of adding debug log statements that print how long a function takes to run? Once you know what is slow it should be easier to troubleshoot. Also I would check to see if you have any DNS issues in your virtual environment.

mooneater · on June 28, 2017

I think I would need to sprinkle django internals with log statements. I would do that as a very last resort but I feel it should be possible to avoid.

I did solve this by removing django debug toolbar though trial and error (and still not sure why it was doing that), but I never found a tool which would discover that problem.

opticalfiber · on June 27, 2017

Can you comment on how compatible this is with asyncio-based applications? Looks like an interesting product and I'm going to try it out regardless, but it would be cool to get some clarity since I couldn't find anything in the docs besides this:

> Time (blocking call) profiler supports threads and gevent.

(from https://stackimpact.com/docs/#getting-started-with-python-pr...)

dmitrim · on June 27, 2017

We haven't tested the whole agent with asyncio applications yet. I guess only CPU profiler was tested during development. We'll do and include it in the docs. For now, if you see any problems, please just open a ticket. Thanks!

opticalfiber · on June 28, 2017

First! https://github.com/stackimpact/stackimpact-python/issues/1 :D

iamnewhereqwer · on June 27, 2017

There's another nice profiler for Python: https://github.com/nvdv/vprof (no history view though)

Oras · on June 27, 2017

How's it different from DataDog? considering you have same price tag

dmitrim · on June 27, 2017

StackImpact is a set of profilers, which continuously sample production applications at low-overhead. The result is line-of-code precision, not just application-level metrics. I think it doesn't really compare to monitoring tools such as DataDog. However, it also sends metrics as well (cpu, memory, GC). (Disclaimer: I work at StackImpact)

sciurus · on June 27, 2017

A more relevant comparison might be with their APM product.

https://www.datadoghq.com/apm/

activatedgeek · on June 27, 2017

Now that this has come up, can somebody explain me how do profilers work? My main concern being regarding the overhead to the process it adds.

alex- · on June 27, 2017

PyCon 2017 had a really good talk about debuggers [1] which covered how PEP523 [2] is making debugging python 3.6+ code much faster. I think that a profiler is somewhat similar, however instead of, potentially, stopping execution on each line it is collecting data.

[1] https://www.youtube.com/watch?v=NdObDUbLjdg [2] https://www.python.org/dev/peps/pep-0523/

activatedgeek · on June 27, 2017

Interesting. Thank you for the link.

sciurus · on June 27, 2017

There are two approaches to profiling:

Deterministic profiling monitors every function call to track timing information. This is very precise but adds significant overhead. Statistical profiling samples the call stack periodically to see what functions are running. This is less precise but has less overhead. The overhead varies depending on how frequently you sample and what the sampling mechanism is.

StackImpact is a statistical profiler. At a quick glance it looks like they're using threading.Timer to periodically run their profiling functions.

activatedgeek · on June 28, 2017

Fair enough. Thanks.

fgarnier · on June 27, 2017

Are there any plans for porting this to Android? For example by using Kivy?

dmitrim · on June 27, 2017

Current agent and the dashboard are designed for long-running applications, such as servers or scripts. There are no plans for end user devices yet. But because the agent is pure Python (it just relies on some system specific functionality, such as signalling), it could work with a few tweaks.

fgarnier · on June 27, 2017

Fair enough. One use case I could think of for this is testing games for long periods on Android, and use this tool to find out where the bottleneck is if there is any. Though given that this is designed for long running apps I can see that it wouldn't be useful for quick profiling but rather for production testing (QA stage etc). There's something similar to that already called Gamebench but it isn't as detailed as this, so was keen on knowing whether this tool would make it to Android :)

jedi_stannis · on June 27, 2017

How would I configure StackImpact to run on my celery workers?

dmitrim · on June 27, 2017

We haven't tested it with celery yet. It looks like it should work. gevent is supported by blocking call profiler, and CPU and memory profilers as well as exception and metric reporting are library independent.

baq · on June 27, 2017

on premise pricing please

davidf18 · on June 27, 2017

craigds · on June 27, 2017

wrong thread?