Hacker News new | past | comments | ask | show | jobs | submit login
Fixing Python Performance with Rust (sentry.io)
331 points by ngoldbaum on Oct 19, 2016 | hide | past | favorite | 103 comments



I'm using Nim's Pymod https://github.com/jboy/nim-pymod for this exact purpose and I think it's much better suited.

The reason is that it can automatically generate C API code without FFI, because FFI calls are slower (https://gist.github.com/brentp/7e173302952b210aeaf3) so there is less overhead. You obviously care about overhead here.

Nim's pymod is already a python module and you can send strings and numpy arrays to Nim for fast processing.

I wish you could send bytes from python3, but that's not implemented yet.


There's rust-cpython that works like this: https://github.com/dgrunwald/rust-cpython/


I'm really not anfan of linking libpython if I can avoid it. More wheels to build and then its limited to cpython. Cffi is plenty fast.


Which method do you find easiest (read: least developer time) for using large rust codebases from Python?


CFFI works quite well. I will probably make a post on my blog later with some ideas on how to do it.


Not only does it work well, it is portable across nearly all Pythons you would want to use, CPython and PyPy.


Can you import that as a lib in Python to work in reverse? It seems to be a different use-case otherwise - using a Python library from Rust vs optimizing a hotspot within Python.



I've looked at Nim a bit, and it does have some nice features. But magic identifier equality [1] is an intrinsically disqualifying design choice.

> If a language's lexical structure is so tortured that I need a special tool to grep it, I'm going to skip it and go back to the sanity of C++. [2]

Nope, nope, no way.

[1] http://nim-lang.org/docs/manual.html#lexical-analysis-identi... [2] https://news.ycombinator.com/item?id=8936542


I only used nim for some side projects but the only times I saw this used anywhere is in FFI bindings.

That way you can mirror, say, C style while writing the bindings either manually or automatically and use them as if they were normal nim elsewhere.


Do you have experience with cython? If so, how does Nim compare for writing fast C extensions?


My personal experience with cython (and why I'm less than lukewarm about it) is that it's about as much fun as C to write and only really helps wih he annoyance of dealing with PyObject. That's something that was not even required for the sourcemap lib.

Debugging and writing cython that is fast is feally not a very pleasant experience and the tooling is not great. Let alone that a real ecosystem exists. There is not even a good way to deal with dependencies at compile time.


Since you can basically write C in Cython, it depends how far you take the cython optimization. You can just add a few types, or you can go nuts via scipy's BLAS interface https://github.com/RaRe-Technologies/gensim/blob/develop/gen...

In general, you can get to less overhead with Cython right now because it's a much more mature project and because you can be more flexible with defining the point where you drop from python to a faster alternative. I wouldn't use that Nim/Python lib in production unless you have time to help with its development. It's good for personal projects though.


A neat case study, but "Embedding Rust in Python" is a poor way to phrase it.

If I'm reading it correctly they're just using CFFI to load and call a shared library - it's really not embedding anything. The fact that the library was written in Rust is interesting, but as far as Python is concerned it could easily have been written in any language that can create a shared library.


> If I'm reading it correctly they're just using CFFI to load and call a shared library - it's really not embedding anything. The fact that the library was written in Rust is interesting, but as far as Python is concerned it could easily have been written in any language that can create a shared library.

Of which there are not that many that would make a shared library that can be safely loaded into a Python process. Traditionally this was limited to C and C++. As far as embedding goes: Rust like most things needs minimal runtime support and that is embedded in the dylib.


FWIW, most languages that can generate native code can generate shared libraries, and will usually have instructions for how to call the libraries from other languages.

Here's how to do it Haskell, for example: https://downloads.haskell.org/~ghc/7.6.3/docs/html/users_gui...

That said, if optimization gets to the point where re-writing in a different language is a good idea, most people just jump to C or C++, or in this case Rust, because they're the fastest.


Indeed. A more interesting comparison would be between C/C++ based shared library and Rust based one.


Or how about Rust-FFI, C-FFI, C-API directly, Nim-to-C-API (pymod), Julia python module, ctypes, cython, libboost, and swig for a good measure.

There are examples of each of these compared to one or the other but it would be nice to see all of them compared in a few benchmarks.


One benchmark to rule-them-all would sound awesome!


Not a fan of the title, you aren't fixing python performance with rust, you are avoiding python performance with it


+1


Small question: since this is improving performance on a given machine, isn't this actually an example of vertical scaling, as opposed to horizontal?


I guess this is just phrased a bit oddly. It's just a factor in helping us scale by requiring us to use our hardware more efficiently. Not sure I'd say it's horizontally or vertically. We've just made each unit of work cheaper to help things along.


It's vertical almost by definition. I'll say it.


I don't think it's either. "Scaling" refers to growing the amount of compute power you are using. This change falls into a different bucket, which is making more efficient use of the compute that's already available.

If they are running multiple Docker containers, the improved efficiency would allow them to run more containers per host computer. Maybe that's what they meant, although it's not really horizontal scaling either.


I think scaling is more about serving demand - whether you you use more resources on a single box, more resources by using more boxes or use the same resources more efficiently you're still scaling.


This is the closest to my use of the term. I'd go further and say that using more resources on a single box is vertical scaling, using more boxes is horizontal, and this thing is neither. How about density scaling?


A previous HN discussion proposed the term "scaling deeper".

Scaling Horizontally - spending resources on adding more nodes to a cluster, so more work can get done in parallel. Scaling Vertically - spending resources on adding computational power to individual nodes, so an individual job can get done faster. Scaling Deeply - spending resources on understanding and optimizing an application, so each job can get done with less computational power.

Each of these can be thought of as an orthogonal axis, with its own curve of diminishing returns.


Why does it have to do with scaling in the first place?

Performance has nothing to do with scalability whatsoever, by the way.


If you halve the memory and CPU usage of a given service (that's not IO bound), have you not improved it's vertical scalability?


Not if those things aren't the bottleneck. Scalability is such a buzzword. The correct term is "performance" or "overhead". Scalability matters when you're talking about an entire system that you're deploying on many machines -- the problem space and technical discussion is different to fixing a performance issue in Python.


David from Sentry here. I've got a long history of talking about scale in the python/web ecosystems. I'll echo what some others have said: scale is fundamentally a business metric.

If we can do more with less machines it's no different than doing more with more machines. We added triple our infrastructure to handle the Python load while we resolved this CPU issue, and that wasn't "scaling" (literally, it causes other scale concerns).

Fixing the root cause let's us drop all of those new machines as well as some older ones. The scalability of the system has greatly increased because of this and other factors -- primarily that many or all systems aren't actually "horizontally" scaleable.


Unless we know where the application is bottlenecked, it's hard to say how this affects scaling. Reducing CPU usage may have no real effect on scalability if IO is what's keeping them back.


So why not Cython like PayPal?


I'd switch it around: why Cython and not Rust, given their CLI is written in Rust?

This strikes me as a perfect use case for Rust, too: it has great compile time memory and other safety guarantees with a speed that is likely close to (or perhaps even better than "safe") C or C++.


Yea they mentioned alternatives, but cython wasn't mentioned at all. I guess Rust makes sure there aren't any memory issues , but it's far from the only easy option.

Cython is used everywhere and can be compiled on the machine it's about to be used on. It also follows best practices for talking to python via FFI.


Not sure what you mean about cython and the FFI. As far as I know, cython is tightly coupled to the CPython C API.

I agree that cython is a good option if you only care about CPython.


That's just not true, Cython works with PyPy https://cython.readthedocs.io/en/latest/src/userguide/pypy.h..., and it's not like you're going to target Jython with CFFI.


Cython does work with pypy, but via the cpyext emulation of the CPython C API. See this answer from one of the pypy devs when I asked about this about a year ago: https://news.ycombinator.com/item?id=10195892

Maybe cpyext has gotten faster since then, but I think that's the state of things still.


That's true, it does have to be using the C API. I think I just meant there are several ways to use that API and it does the "harder way" automatically so it's more efficient.


I think they said they had a pre-existing source-map parser written in Rust for their CLI.


It looks like they already had a Rust parser that they reused


leveraging the crate ecosystem is one reason


this is pretty important here, given that we already have a sourcemap library we built in Rust (for our sentry-cli tool)


Because they already had Rust code to do the job (in a CLI tool), it was just a question of extracting and binding it.


Because the author (Armin Ronacher) loves Rust. And because common sense is lame.


That's very disingenuous as Armin Ronacher/mitsuhiko has done so much for Python.


Disingenuous? I'm not lying, he is publicly a Rust enthusiast. It would be no less true if I said this about Steve Klabnik regardless of his contributions to Ruby on Rails.


Armin might like Rust but that doesn't mean that's why it was chosen in this case. In fact the article provides a reason: they already had a Rust library for source maps and that is a very good reason to go with Rust.

More generally though you're being unfair by not even seriously considering the option that there could be a legitimate reason. Also you're simply being an insulting with your common sense comment.

You are being disingenuous, for pretending this is about truth in some way and you're coming across as being an asshole by insulting people.


Huh, I wonder who's responsible for their adoption of Rust in the first place. A real brain-teaser, this one. Anyway my point stands (and glad to see I've ruffled some feathers): Cython is the dull and maintainable choice here. Tell me which story is better for the new hire who has to work on this (David Cramer's shop is plainly a Python shop):

1) python setup.py install. Read a Cython tutorial.

2) Install the Rust toolchain and learn Rust. Hope that the language hasn't changed drastically in the past month.

Hm...

But I appreciate the "asshole" ad hominem.


Rust has been stable for a year and a half, it won't "change drastically" like that.


Isn't it common sense for a person who's been writing programs in a high level programming language to use a safe low level programming language.


Yes, Nim would be an excellent choice. See how you missed the point?


The issue is not if Armin (or myself) likes Rust; the issue here is that you're asserting that Armin chose Rust because he lacks common sense.

Given that there's an entire article about the technical merits (and success) of this approach, and Armin is someone who is known for building excellent, widely used software, well...


Beetlejuice, Beetlejuice, Beetlejuice!

/me waits for pcwalton too

Sorry, writing a blog post, while it is Armin Ronacher's forte (yes, I use the full name as we are not on first-name basis like other HN cool kids are), does not disprove that this is the riskier choice and was made because the author likes Rust.


I don't care about Rust or even like it. Armin made a great decision here and I, and everyone else at Sentry back it. Go troll somewhere else.


I'm not trolling, and you should know that people can make poor decisions in groups too.


The hivemind hates people like you that point out the elephant in the room


I did the same thing with Go and Ruby: http://blog.paracode.com/2015/08/28/ruby-and-go-sitting-in-a...

IMHO the end result is more maintainable, readable, and accessible from FFI point of view. Regarding the performance, so Go has a GC, but I'm wondering if that would affect things dramatically at all.

Here is the Ruby side FFI code: https://github.com/jondot/scatter/blob/master/lib/scatter.rb

And here's the "native" part: https://github.com/jondot/scatter/tree/master/ext

Every now and then I keep looking at Rust and how it can integrate with higher level languages, the last time I really wanted OpenCV to work well with Rust. I think that's a big selling point. So far, to me, it's not perfect yet but it may get there.

From a pragmatic point of view, I imagine Sentry getting more bang for a buck with Go as there would be less wheels to invent from an ecosystem POV, and from a maintenance POV it would be closer to Python. But that wouldn't advance any of the Rust ecosystem at all, and we do need that as a collective.


How do you deal with the lifetime of memory passing from Python to Rust and back in the presence of two garbage collectors?


If I understand the question correctly, you are saying:

1. There's a bunch of objects that need to pass down the ffi boundaries py->go 2. compute 3. There's a bunch of objects that need to pass up the ffi boundaries go->py 4. Python now continues as usual with a bunch of processed objects

In that case, yes this would be a problem. The way I'd resolve it is by planning the ffi boundaries accordingly. I'd make python do as less as possible, and pass just declarative "instructions" to go. In this case where's the file location and where's the sourcemap file location (and perhaps where to dump output to if that's the case). And go doing as much work as possible to make sure there's only a minimal number of objects passed back if any.

It may _feel_ like a hack but ultimately its the same approach if you were to make a "sourcemap server" making python code communicate with it over RPC.

If this is not the problem then I'd love an example of what you meant


> 1. There's a bunch of objects that need to pass down the ffi boundaries py->go 2. compute 3. There's a bunch of objects that need to pass up the ffi boundaries go->py 4. Python now continues as usual with a bunch of processed objects

You can look at the library in question. An object gets created in Rust but the ownership of that object is held in Python. When the Python GC runs we clean up the Rust object.

> It may _feel_ like a hack but ultimately its the same approach if you were to make a "sourcemap server" making python code communicate with it over RPC.

Sure, but that significantly complicates the problem. To the point in fact where I question if the Go solution makes any sense at all because it takes away the advantage you have where you can just drop an extension module in without much work. Once you need to restructure your system to be message based you might as well go in and run a separate process and use a unix pipe to communicate. We used to do that for a few things like our debug symbol symbolication and the downsides are just too big.


I understand. So now if I may backpaddle a bit, why go through the trouble of having python own the objects? why not let Rust (or Go) deal with the entire bulk of the job at hand?


Because that would require a huge changes to our codebase. We pass those objects around in various places already.


Shouldn't it be said improving Python performance? Is this a bugfix of Python that can't be 'fixed' in the initial software? Maybe I'm just reading it oddly.

Sidenote: I wonder how improving Python performance with D fares considering it links up to C pretty nicely.


I don't think you can write a DLL/dylib in D to work with Python, since D has a GC (actually, it looks like maybe use of the GC can be worked around with some careful programming in D. Still, the D runtime itself may conflict.) From the article:

> In that case, your requirements to the language are pretty harsh: it must not have an invasive runtime, must not have a GC, and must support the C ABI. Right now, the only languages I think that fit this are C, C++, and Rust.


You can write DLL/dylib in D to work with Python: http://pyd.readthedocs.io/en/latest/functions.html

You can opt-out of D's GC. Turn on `-vgc` flag during compilation and replace GC'd code with non-GC'd code and you are not using any GC.


Oh cool, I did not see that library. Looks like it is pretty straightforward, nice!


I'm sceptical. I highly doubt GC matters here at all. The runtime restrictions might be the author's own requirements.


I am not intelligent rnough to contemplate about the ramifications of teo independent GCs having control over their own memory and making that work together well. I'm sure there are ways but it's definitely not easy and not something I would just try on a whim.


I would wager this has more to do with the potential for 2 GCs to interact poorly than with GC in general.


> Maybe I'm just reading it oddly.

No, it doesn't make a lot of sense.


Super cool!

Is there a reason the Rust-exported functions aren't marked with `extern "C"`?


Are you referring to the declarations in libsourcemap.h or the definitions in cabi.rs?

Regarding the declarations: this would only be necessary if processed in C++ context, to change the name mangling/linkage features of the declarations. For portability sometimes authors hide these behind "ifdef __cplusplus" barriers, but it's not really critical here.

Regarding the definitions: "Exposing a C ABI in Rust" from the article describes this in detail. For the most part, "#[no_mangle]" has the same effect that "extern "C"" has on linkage/mangling in C++.


I'm referring to the definitions of the Rust functions.

As far as I know, `#[no_mangle]` disables name-mangling but doesn't change the ABI of a function. That's what `extern "C"` is for in Rust -- to declare a function with the C ABI. You can have a Rust ABI function with an unmangled name (what it looks is done in the post) and you can have a C ABI function in Rust with a mangled name (by using `extern "C"` but not `#[no_mangle]` -- for example for C callbacks).

Based on my limited understanding of C++, `extern "c"` in C++ is equivalent to using both `#[no_mangle]` and `pub extern "C"` in Rust. I would guess that much of the time failing to specify a C ABI would work out fine unless you try to accept or pass non-FFI types (enums, references, etc) but I'm not sure.

It's confusing as hell. There was a thread about the mixed up semantics somewhat recently on the internals forum: https://internals.rust-lang.org/t/no-no-mangle/3973 (edit: and also https://internals.rust-lang.org/t/precise-semantics-of-no-ma...).

If you look at the LLVM IR generated from https://is.gd/Hfup3X you can see that the extern "C" fn differs in that it's given a `nounwind` attribute, among other things.


Ok, good call.

Thanks for the tip btw I think this means I have a bug in my code. ;)


I'm glad it helped! Sorry if it came on strong.


#[no_mangle] turns off the name mangling so it emits a symbol in the style C would. But AFAIK it doesn't force the function to use C calling conventions. As demonstrated in https://doc.rust-lang.org/book/ffi.html#calling-rust-code-fr... you probably still need the `extern` keyword on your function definition, e.g.

  #[no_mangle]
  pub extern unsafe fn lsm_view_from_json(bytes: *const u8, len: c_uint,
                                          err_out: *mut CError) -> *mut View
  {
    ...
  }


Other people have said this, but as a concrete example of the difference between #[no_mangle] and extern: Rust does a thing called return value optimization (RVO), C does not. If you have the following library header:

    typedef struct { int x[500]; } bigthing;
    bigthing MyFunction();
and a program using that library:

    int main() {
        bigthing x = MyFunction();
    }
C has no choice but to have MyFunction allocate a couple kilobytes on the stack and have main copy the structure from the stack to where it should eventually go. The equivalent Rust code, however, can pass the address of the object x to MyFunction, as if it were actually void MyFunction(bigstruct &output).

If you have a #[no_mangle] but not extern function in Rust, the Rust compiler will generate code for that function that looks for this secret by-reference argument and fills it in. When you call it from C (or something that calls functions in a C-like manner, like Python's cffi), it won't be setting up the call like that at all, and it will expect to read the result off the stack like a C function would have done.

(I don't think there's a lot of use for #[no_mangle] without extern. The best I can think of is that, if you have a Rust application that dynamically loads a Rust library and runs a function from it, you don't have access to the mangling algorithm, once you find the function you'll call it according to the Rust ABI. But even that is risky since the Rust ABI can change between compiler versions; you're still better off shoveling things through the C ABI.)


> C has no choice but to have MyFunction allocate a couple kilobytes on the stack and have main copy the structure from the stack to where it should eventually go. The equivalent Rust code, however, can pass the address of the object x to MyFunction, as if it were actually void MyFunction(bigstruct &output).

What sort of C implementation (ABI, compiler, etc.) are you thinking of here? gcc (x86, x86-64, ARM) is perfectly capable of doing the exact optimization that you describe Rust being able to do.


Across a public interface in a shared library? I know it can do that optimization within an object, but the SysV ABI does not (to my knowledge) let it expose that optimization at a shared library boundary. I believe Rust has that optimization as part of its inter-library ABI (partly because Rust's ABI is not stable).

I guess the trick here is that "C" really means "platform ABI" and isn't inherently about a language or a compiler.


Yes, across a public interface in a shared library. The SysV ABI just requires that the callee passes a pointer to the structure return value as a hidden parameter; there's nothing saying that pointer must point to different memory than the named object in the program. (Consider how you would implement the call to MyFunction and MyFunction itself at an assembly level that would absolutely require the compiler to allocate two different bigthing objects.)


I must admit I don't know. It does not appear to be necessary, I assume it's implied in one way or another for `no_mangle` extern functions. However I must also admit that I did not look into the exact mechanics.

I do know that if you bind to a function that is linked in that you need it.


`#[no_mangle]` is currently a bit overloaded; there's discussion of whether this is correct or not:

https://internals.rust-lang.org/t/precise-semantics-of-no-ma...


It's for C++ consumers of C libraries. Doesn't matter for Python <-> Rust.


There is an extern C in Rust as well however which should enforce c calling conventions.


Right, but `extern "C"` is for C++ programs using C library headers directly. Rust doesn't use C headers directly, I think, so it wouldn't need this kludge.


I think you may be confused. The original commenter wasn't referring to use of 'extern "C"' in the C header file, but to its (lack of) use in the Rust source that defines those functions.


How about this:

Why not just deserialize all the source maps ahead of time and just store/retrieve them as msgpack objects?

Per this python serialization speed comparison, msgpack is ~ 10X faster than json. So you get the same speed up, but no Rust.

https://gist.github.com/cactus/4073643


Stupid Question:

Why not just deserialize all the source maps ahead of time and just store/retrieve them via cPickle? Wouldn't that get you almost the same results without having to learn and support a second language (Rust, in this case)?

[Edit]

cPickle is slower than JSON, but browsing the interwebs it seems that marshal can be 2X faster than JSON and 4X faster than cPickle.


Things wrong with the title:

- It is not fixing python's performance.

- The performance improvement has very little to do with the choice of Rust.


I think that's just a combination your initial interpretation of the title and being a little too pedantic.

They are fixing a case of Python performance being a problem in the context of their needs, and the way they solved it was with Rust (and there's no implication that it had to be Rust in the title).

It's not wrong, it's just vague.


I remember Skylight.io did something similar with Ruby.

Edit: http://blog.skylight.io/introducing-helix/


That's a bit different, the point of helix is to easily build native modules for Ruby in Rust. The case here was using a regular FFI (cffi) to call Rust code as if it were C, without using Python's C API or anything.


A good write-up and a great case for Rust.


"oh my god, using native code is faster than interpreting, such exciting and revolutionary news"


To summarize: instead of improving Python's maps to consume less memory they've embedded an entirely different language, Rust, into Python to solve a particular problem.

Doesn't make Python look good.


I disagree, I think it makes Python look really good. Being an interpreted, dynamic language where most internals are represented by a hash table, it'll never compete in terms of raw speed. The people who choose Python choose it for the ecosystem and productivity, and from that perspective it's actually to its benefit that it's easy to replace your bottleneck code with a faster implementation. The only high level language in my experience easier to drop to C/C++(and now Rust) would be Lua of the luajit flavor.


Python is hard to beat for speed of development. Rust is hard to beat for performance and memory usage. So why not combine the two?


Complexity.


Out of all options we had this was probably the least conplex one given our set of tools and experience.


Complexity is a relative concern.


By Python, you mean dynamic scripting languages? Everyone already knows the tradeoff in using one of those over using a language like Rust (and vice versa). It's no surprise that Rust uses less memory. It also takes more effort to code in. What exactly is your point? That sometimes a language like Rust is the better tool? That one programming language is not the best tool in all situations?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: