The reason is that it can automatically generate C API code without FFI, because FFI calls are slower (https://gist.github.com/brentp/7e173302952b210aeaf3) so there is less overhead. You obviously care about overhead here.
Nim's pymod is already a python module and you can send strings and numpy arrays to Nim for fast processing.
I wish you could send bytes from python3, but that's not implemented yet.
Can you import that as a lib in Python to work in reverse? It seems to be a different use-case otherwise - using a Python library from Rust vs optimizing a hotspot within Python.
My personal experience with cython (and why I'm less than lukewarm about it) is that it's about as much fun as C to write and only really helps wih he annoyance of dealing with PyObject. That's something that was not even required for the sourcemap lib.
Debugging and writing cython that is fast is feally not a very pleasant experience and the tooling is not great. Let alone that a real ecosystem exists. There is not even a good way to deal with dependencies at compile time.
In general, you can get to less overhead with Cython right now because it's a much more mature project and because you can be more flexible with defining the point where you drop from python to a faster alternative. I wouldn't use that Nim/Python lib in production unless you have time to help with its development. It's good for personal projects though.
A neat case study, but "Embedding Rust in Python" is a poor way to phrase it.
If I'm reading it correctly they're just using CFFI to load and call a shared library - it's really not embedding anything. The fact that the library was written in Rust is interesting, but as far as Python is concerned it could easily have been written in any language that can create a shared library.
> If I'm reading it correctly they're just using CFFI to load and call a shared library - it's really not embedding anything. The fact that the library was written in Rust is interesting, but as far as Python is concerned it could easily have been written in any language that can create a shared library.
Of which there are not that many that would make a shared library that can be safely loaded into a Python process. Traditionally this was limited to C and C++. As far as embedding goes: Rust like most things needs minimal runtime support and that is embedded in the dylib.
FWIW, most languages that can generate native code can generate shared libraries, and will usually have instructions for how to call the libraries from other languages.
That said, if optimization gets to the point where re-writing in a different language is a good idea, most people just jump to C or C++, or in this case Rust, because they're the fastest.
I guess this is just phrased a bit oddly. It's just a factor in helping us scale by requiring us to use our hardware more efficiently. Not sure I'd say it's horizontally or vertically. We've just made each unit of work cheaper to help things along.
I don't think it's either. "Scaling" refers to growing the amount of compute power you are using. This change falls into a different bucket, which is making more efficient use of the compute that's already available.
If they are running multiple Docker containers, the improved efficiency would allow them to run more containers per host computer. Maybe that's what they meant, although it's not really horizontal scaling either.
I think scaling is more about serving demand - whether you you use more resources on a single box, more resources by using more boxes or use the same resources more efficiently you're still scaling.
This is the closest to my use of the term. I'd go further and say that using more resources on a single box is vertical scaling, using more boxes is horizontal, and this thing is neither. How about density scaling?
A previous HN discussion proposed the term "scaling deeper".
Scaling Horizontally - spending resources on adding more nodes to a cluster, so more work can get done in parallel.
Scaling Vertically - spending resources on adding computational power to individual nodes, so an individual job can get done faster.
Scaling Deeply - spending resources on understanding and optimizing an application, so each job can get done with less computational power.
Each of these can be thought of as an orthogonal axis, with its own curve of diminishing returns.
Not if those things aren't the bottleneck. Scalability is such a buzzword. The correct term is "performance" or "overhead". Scalability matters when you're talking about an entire system that you're deploying on many machines -- the problem space and technical discussion is different to fixing a performance issue in Python.
David from Sentry here. I've got a long history of talking about scale in the python/web ecosystems. I'll echo what some others have said: scale is fundamentally a business metric.
If we can do more with less machines it's no different than doing more with more machines. We added triple our infrastructure to handle the Python load while we resolved this CPU issue, and that wasn't "scaling" (literally, it causes other scale concerns).
Fixing the root cause let's us drop all of those new machines as well as some older ones. The scalability of the system has greatly increased because of this and other factors -- primarily that many or all systems aren't actually "horizontally" scaleable.
Unless we know where the application is bottlenecked, it's hard to say how this affects scaling. Reducing CPU usage may have no real effect on scalability if IO is what's keeping them back.
I'd switch it around: why Cython and not Rust, given their CLI is written in Rust?
This strikes me as a perfect use case for Rust, too: it has great compile time memory and other safety guarantees with a speed that is likely close to (or perhaps even better than "safe") C or C++.
Yea they mentioned alternatives, but cython wasn't mentioned at all. I guess Rust makes sure there aren't any memory issues , but it's far from the only easy option.
Cython is used everywhere and can be compiled on the machine it's about to be used on. It also follows best practices for talking to python via FFI.
Cython does work with pypy, but via the cpyext emulation of the CPython C API. See this answer from one of the pypy devs when I asked about this about a year ago: https://news.ycombinator.com/item?id=10195892
Maybe cpyext has gotten faster since then, but I think that's the state of things still.
That's true, it does have to be using the C API. I think I just meant there are several ways to use that API and it does the "harder way" automatically so it's more efficient.
Disingenuous? I'm not lying, he is publicly a Rust enthusiast. It would be no less true if I said this about Steve Klabnik regardless of his contributions to Ruby on Rails.
Armin might like Rust but that doesn't mean that's why it was chosen in this case. In fact the article provides a reason: they already had a Rust library for source maps and that is a very good reason to go with Rust.
More generally though you're being unfair by not even seriously considering the option that there could be a legitimate reason. Also you're simply being an insulting with your common sense comment.
You are being disingenuous, for pretending this is about truth in some way and you're coming across as being an asshole by insulting people.
Huh, I wonder who's responsible for their adoption of Rust in the first place. A real brain-teaser, this one. Anyway my point stands (and glad to see I've ruffled some feathers): Cython is the dull and maintainable choice here. Tell me which story is better for the new hire who has to work on this (David Cramer's shop is plainly a Python shop):
1) python setup.py install. Read a Cython tutorial.
2) Install the Rust toolchain and learn Rust. Hope that the language hasn't changed drastically in the past month.
The issue is not if Armin (or myself) likes Rust; the issue here is that you're asserting that Armin chose Rust because he lacks common sense.
Given that there's an entire article about the technical merits (and success) of this approach, and Armin is someone who is known for building excellent, widely used software, well...
Sorry, writing a blog post, while it is Armin Ronacher's forte (yes, I use the full name as we are not on first-name basis like other HN cool kids are), does not disprove that this is the riskier choice and was made because the author likes Rust.
IMHO the end result is more maintainable, readable, and accessible from FFI point of view. Regarding the performance, so Go has a GC, but I'm wondering if that would affect things dramatically at all.
Every now and then I keep looking at Rust and how it can integrate with higher level languages, the last time I really wanted OpenCV to work well with Rust. I think that's a big selling point. So far, to me, it's not perfect yet but it may get there.
From a pragmatic point of view, I imagine Sentry getting more bang for a buck with Go as there would be less wheels to invent from an ecosystem POV, and from a maintenance POV it would be closer to Python. But that wouldn't advance any of the Rust ecosystem at all, and we do need that as a collective.
If I understand the question correctly, you are saying:
1. There's a bunch of objects that need to pass down the ffi boundaries py->go
2. compute
3. There's a bunch of objects that need to pass up the ffi boundaries go->py
4. Python now continues as usual with a bunch of processed objects
In that case, yes this would be a problem. The way I'd resolve it is by planning the ffi boundaries accordingly. I'd make python do as less as possible, and pass just declarative "instructions" to go. In this case where's the file location and where's the sourcemap file location (and perhaps where to dump output to if that's the case). And go doing as much work as possible to make sure there's only a minimal number of objects passed back if any.
It may _feel_ like a hack but ultimately its the same approach if you were to make a "sourcemap server" making python code communicate with it over RPC.
If this is not the problem then I'd love an example of what you meant
> 1. There's a bunch of objects that need to pass down the ffi boundaries py->go 2. compute 3. There's a bunch of objects that need to pass up the ffi boundaries go->py 4. Python now continues as usual with a bunch of processed objects
You can look at the library in question. An object gets created in Rust but the ownership of that object is held in Python. When the Python GC runs we clean up the Rust object.
> It may _feel_ like a hack but ultimately its the same approach if you were to make a "sourcemap server" making python code communicate with it over RPC.
Sure, but that significantly complicates the problem. To the point in fact where I question if the Go solution makes any sense at all because it takes away the advantage you have where you can just drop an extension module in without much work. Once you need to restructure your system to be message based you might as well go in and run a separate process and use a unix pipe to communicate. We used to do that for a few things like our debug symbol symbolication and the downsides are just too big.
I understand. So now if I may backpaddle a bit, why go through the trouble of having python own the objects? why not let Rust (or Go) deal with the entire bulk of the job at hand?
Shouldn't it be said improving Python performance? Is this a bugfix of Python that can't be 'fixed' in the initial software? Maybe I'm just reading it oddly.
Sidenote: I wonder how improving Python performance with D fares considering it links up to C pretty nicely.
I don't think you can write a DLL/dylib in D to work with Python, since D has a GC (actually, it looks like maybe use of the GC can be worked around with some careful programming in D. Still, the D runtime itself may conflict.) From the article:
> In that case, your requirements to the language are pretty harsh: it must not have an invasive runtime, must not have a GC, and must support the C ABI. Right now, the only languages I think that fit this are C, C++, and Rust.
I am not intelligent rnough to contemplate about the ramifications of teo independent GCs having control over their own memory and making that work together well. I'm sure there are ways but it's definitely not easy and not something I would just try on a whim.
Are you referring to the declarations in libsourcemap.h or the definitions in cabi.rs?
Regarding the declarations: this would only be necessary if processed in C++ context, to change the name mangling/linkage features of the declarations. For portability sometimes authors hide these behind "ifdef __cplusplus" barriers, but it's not really critical here.
Regarding the definitions: "Exposing a C ABI in Rust" from the article describes this in detail. For the most part, "#[no_mangle]" has the same effect that "extern "C"" has on linkage/mangling in C++.
I'm referring to the definitions of the Rust functions.
As far as I know, `#[no_mangle]` disables name-mangling but doesn't change the ABI of a function. That's what `extern "C"` is for in Rust -- to declare a function with the C ABI. You can have a Rust ABI function with an unmangled name (what it looks is done in the post) and you can have a C ABI function in Rust with a mangled name (by using `extern "C"` but not `#[no_mangle]` -- for example for C callbacks).
Based on my limited understanding of C++, `extern "c"` in C++ is equivalent to using both `#[no_mangle]` and `pub extern "C"` in Rust. I would guess that much of the time failing to specify a C ABI would work out fine unless you try to accept or pass non-FFI types (enums, references, etc) but I'm not sure.
If you look at the LLVM IR generated from https://is.gd/Hfup3X you can see that the extern "C" fn differs in that it's given a `nounwind` attribute, among other things.
#[no_mangle] turns off the name mangling so it emits a symbol in the style C would. But AFAIK it doesn't force the function to use C calling conventions. As demonstrated in https://doc.rust-lang.org/book/ffi.html#calling-rust-code-fr... you probably still need the `extern` keyword on your function definition, e.g.
Other people have said this, but as a concrete example of the difference between #[no_mangle] and extern: Rust does a thing called return value optimization (RVO), C does not. If you have the following library header:
typedef struct { int x[500]; } bigthing;
bigthing MyFunction();
and a program using that library:
int main() {
bigthing x = MyFunction();
}
C has no choice but to have MyFunction allocate a couple kilobytes on the stack and have main copy the structure from the stack to where it should eventually go. The equivalent Rust code, however, can pass the address of the object x to MyFunction, as if it were actually void MyFunction(bigstruct &output).
If you have a #[no_mangle] but not extern function in Rust, the Rust compiler will generate code for that function that looks for this secret by-reference argument and fills it in. When you call it from C (or something that calls functions in a C-like manner, like Python's cffi), it won't be setting up the call like that at all, and it will expect to read the result off the stack like a C function would have done.
(I don't think there's a lot of use for #[no_mangle] without extern. The best I can think of is that, if you have a Rust application that dynamically loads a Rust library and runs a function from it, you don't have access to the mangling algorithm, once you find the function you'll call it according to the Rust ABI. But even that is risky since the Rust ABI can change between compiler versions; you're still better off shoveling things through the C ABI.)
> C has no choice but to have MyFunction allocate a couple kilobytes on the stack and have main copy the structure from the stack to where it should eventually go. The equivalent Rust code, however, can pass the address of the object x to MyFunction, as if it were actually void MyFunction(bigstruct &output).
What sort of C implementation (ABI, compiler, etc.) are you thinking of here? gcc (x86, x86-64, ARM) is perfectly capable of doing the exact optimization that you describe Rust being able to do.
Across a public interface in a shared library? I know it can do that optimization within an object, but the SysV ABI does not (to my knowledge) let it expose that optimization at a shared library boundary. I believe Rust has that optimization as part of its inter-library ABI (partly because Rust's ABI is not stable).
I guess the trick here is that "C" really means "platform ABI" and isn't inherently about a language or a compiler.
Yes, across a public interface in a shared library. The SysV ABI just requires that the callee passes a pointer to the structure return value as a hidden parameter; there's nothing saying that pointer must point to different memory than the named object in the program. (Consider how you would implement the call to MyFunction and MyFunction itself at an assembly level that would absolutely require the compiler to allocate two different bigthing objects.)
I must admit I don't know. It does not appear to be necessary, I assume it's implied in one way or another for `no_mangle` extern functions. However I must also admit that I did not look into the exact mechanics.
I do know that if you bind to a function that is linked in that you need it.
Right, but `extern "C"` is for C++ programs using C library headers directly. Rust doesn't use C headers directly, I think, so it wouldn't need this kludge.
I think you may be confused. The original commenter wasn't referring to use of 'extern "C"' in the C header file, but to its (lack of) use in the Rust source that defines those functions.
Why not just deserialize all the source maps ahead of time and just store/retrieve them via cPickle? Wouldn't that get you almost the same results without having to learn and support a second language (Rust, in this case)?
[Edit]
cPickle is slower than JSON, but browsing the interwebs it seems that marshal can be 2X faster than JSON and 4X faster than cPickle.
I think that's just a combination your initial interpretation of the title and being a little too pedantic.
They are fixing a case of Python performance being a problem in the context of their needs, and the way they solved it was with Rust (and there's no implication that it had to be Rust in the title).
That's a bit different, the point of helix is to easily build native modules for Ruby in Rust. The case here was using a regular FFI (cffi) to call Rust code as if it were C, without using Python's C API or anything.
To summarize: instead of improving Python's maps to consume less memory they've embedded an entirely different language, Rust, into Python to solve a particular problem.
I disagree, I think it makes Python look really good. Being an interpreted, dynamic language where most internals are represented by a hash table, it'll never compete in terms of raw speed. The people who choose Python choose it for the ecosystem and productivity, and from that perspective it's actually to its benefit that it's easy to replace your bottleneck code with a faster implementation. The only high level language in my experience easier to drop to C/C++(and now Rust) would be Lua of the luajit flavor.
By Python, you mean dynamic scripting languages? Everyone already knows the tradeoff in using one of those over using a language like Rust (and vice versa). It's no surprise that Rust uses less memory. It also takes more effort to code in. What exactly is your point? That sometimes a language like Rust is the better tool? That one programming language is not the best tool in all situations?
The reason is that it can automatically generate C API code without FFI, because FFI calls are slower (https://gist.github.com/brentp/7e173302952b210aeaf3) so there is less overhead. You obviously care about overhead here.
Nim's pymod is already a python module and you can send strings and numpy arrays to Nim for fast processing.
I wish you could send bytes from python3, but that's not implemented yet.