Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
24 days of Rust – Rayon (siciarz.net)
205 points by zsiciarz on Dec 3, 2016 | hide | past | favorite | 61 comments


Rayon is definitely the best parallelism library I've ever used. We recently switched Servo over to using it for parallel restyling and layout and saw small gains in performance over our previous solution and drastic reduction in code complexity (and removed a whole pile of domain-specific unsafe code).

Being able to switch .iter() to .par_iter() and have things "just work" is a game changer.

The crucial thing about rayon is that sequential fallback is really fast, almost as fast as the sequential code you'd write anyway. This is important because, as paradoxical as it sounds, most CPU-bound programs work with small workloads most of the time, and so they don't want the overhead of parallelism for those cases. (It's the analogue of saving power by putting the CPU to sleep when it's not in use.) The occasional big workload that comes along is what you really want parallelism for, and the big trick is to handle that case without regressing the common sequential case. Rayon's work stealing approach based around scoped iterators is the ideal solution for this.


> Being able to switch .iter() to .par_iter() and have things "just work" is a game changer.

It's called .parallel() in D, works the same way I guess. It turns a lazy computation chain into a parallel one.


I have played around a tiny bit with par_iter over blocking IO tasks and seen some sched_yield() loops burning CPU time instead of backing off to futex_wait. That seems suboptimal and not exactly "the best ever" I'd expect from a parallelism library.


They should be doing that for a few iterations before backing off. Otherwise you end up with bad scheduling leading to slow warmups, among other problems.

You shouldn't use rayon for blocking I/O; that's not what it's designed for. Rayon is a parallelism library, not a concurrency library.


> Rayon is definitely the best parallelism library I've ever used.

Whenever people say this, I ask the following question in order to gauge whether I want to try the library in question:

Have you used Twisted?

Twisted is, to me, the quintessential example of a high-quality open source project. If you have used it extensively and still recommend Rayon, I'll give it a try.


Assuming you're talking about the python library, it is more going for asynchronous IO and other more "concurrency" things than the data parallelism that rayon is designed for.


Yeah, twisted is closer to tokio than rayon.


Hey Steve. We met at TwilioCon 2011 - not sure if you remember that. How have you been?

Is there a good guide for someone in my position? ie, to learn about tokio and rayon, having used python (for, in this case, concurrency and data science respectively)?

Are you mostly using Rust these days?


Oh hey! That was a very long time ago, but I loved TwilioCon. Things are good. I'm actually working on Rust full-time, so yeah, I use it a lot. :)

I'm not sure there's a great guide yet, because a lot of this stuff is still shaking out. The Rust ecosystem in general is growing at a pretty steady clip, and new stuff pops up all the time: tokio is less than a year old, for example.

There's two different kinds of problems here: "I found a library, what does it do?" and "What libraries exist?" In the former case, you're at the mercy of the library author to give you a good description. With the latter, one of the better ways is to drop by #rust on IRC, or post to users.rust-lang.org, asking for an overview of what exists. https://crates.io/search is also helpful.

In this case, rayon is for "data parallelism", meaning "I have some data, I would like to do some work on it, and I'd like to make that paralell." Tokio is about asynchronous I/O.


Twisted is not for parallelism. Twisted is for concurrency.


This is very nice. In Rust, if you accidentally share mutable data between threads, the borrow checker should catch it at compile time. Few other languages catch such errors. Go, for example, does not. This makes writing parallel code much, much safer.


I have not done any real programming in Rust, but whenever I see Rust code I'm amazed how different is it from Go, despite both having some shared use cases. Go's main selling point beyond concurrency is simplicity. And it's the simplicity that I like about it. On the other hand, it looks to me like Rust is turning into Scala.


My personal experience with Rust (vs. Go and other languages) is that there is something really magical that all the sigils and syntactic complexity give you: once you've internalized Rust's approach it's ridiculously easy to build an accurate mental model of what's happening in almost any piece of code. From the high level constructs down to generated assembly, Rust produces the most predictable code I've ever written. While not the same kind of simplicity you're describing, my experience is that the value of Go's simplicity is to give you a low overhead mental model. While Rust has some complexity, I personally find it to have the lowest overhead mental model of anything I've worked with, due to its predictability, explicitness, and strong conventions.

Granted, it definitely took some time for me to gain the experience necessary for this magic to occur. It's not really a "hack a quick thing together once every 10 years" kind of language.

It's hard to convey this without getting you to actually learn Rust for a few days/weeks/months, but it's certainly been my experience and I hear it all the time from other Rust developers.


The problem with people referring to more or less simple mental models, syntax, learning curves, expressiveness, etc is that they usually only focus on one of those at a time, when it requires multiple of them to get a good picture of how a language will work in practice for you, and in general.

As an extremely simple example of this, compare BASIC and APL in terms of the learning curve. If we examine it in isolation, BASIC is obviously better. But if we use multiple criteria, the answer becomes much more nuanced, as we can see what the steeper learning curve allows. For a less extreme, but still ultimately the same comparison, imagine Perl and Python, or Go and Rust. A simple mental model is important, but if one choice is less simple, the question should really be what are you getting in return, and is the trade-off worth it? Otherwise, you should just program in BASIC and be done it it.


I would compare Rust to Python rather than Perl. Like Python, Rust tries to have only one obvious way to do things, and avoiding TIMTOWTDI is a conscious goal.


I wasn't actually comparing Rust to either, I was using the common comparison of Perl to Python to illustrate the point. Perl is often denigrated for being hard to read, but I maintain that for the most part that's a function of experience with the language (learning curve). After you've internalized a few things, I think Perl is easier to read in many cases, because the sigils, which people often complain about, actually convey additional information in a succinct and recognizable way once you learn to read them correctly. It's what you get from having a steeper (or actually, longer, not really steeper) learning curve. But that's still not the full picture, as you've alluded to with TIMTOWDI, which is yet another spectrum on which languages can be compared.


I'm still in the early days of learning Rust, but from what I've seen so far I agree with that feeling of having a better mental model of the program. At first the language and syntax seemed overwhelming, but slowly its all coming together and what I thought was verbose syntax turns out to be incredibly helpful in terms of forcing me to think things through.

Several times now when caught out by the borrow checker, I've wondered, wow, how do other languages get away with allowing a situation like this.

This video series by the author of Rayon (and Rust core dev) was what first hooked me and made the syntax a lot more understandable: http://intorust.com/


What about maintainability here, how would you rate it vs. (go, python, etc.)? I.e. coming into a new code base, etc?


I'm neither the OP nor a Rust expert, but I have found it quite easy to jump into and read/tweak both an established expert's code base and a new learner's code base.

Compared to Python: Rust is certainly a bit more verbose, due to being much more careful with performance, memory and precise types. But I find that the last one actually makes Rust easier to maintain. In Rust, I just have to look at the types being passes around, while in a complex Python code base, I might ask PyCharm for all calls to a function and work backwards, slowly try to infer what kind of thing a variable might be.

Go is much closer to Rust in terms of maintainability, but I haven't worked with it enough to say, because I didn't like other aspects of the language.


I know scala a bit and rust only barely, but they seem like very different languages to me.

Scala tries to enable you to create whatever api you want, so it has many different flavours of magic to allow flexibility of expression. Predictably this is used and abused horribly by a community who can't come to a consensus on what good taste is.

Rust on the other hand makes you explicitly say everything that happens. Nothing will happen without you being aware of it, and that makes it wordy and feel more complex, but it's the kind of complexity of having to say everything you mean down to a much greater level of detail rather than the kind of complexity that arises when it's almost impossible to know exactly what is going on.

Rust and Scala are pretty much at opposite ends of the 'magic' spectrum.


> Nothing will happen without you being aware of it

This is exactly the opposite what I understood from the article that Rayon does. For example, it will schedule your code differently depending on the current conditions of the CPU. That kind of non-determinism is hard to deal with if you are doing something more complex than adding numbers. I imagine this is fine for building mathematical libraries, where you care about the result, not about the order of operations. But if you are interacting with external APIs, it's very hard to reason about this kind of parallelism.


This makes no sense to me. You are absolutely aware this will happen because you called Rayon's `.par_iter()` method!

The pitfalls of parallel and concurrent computing are well known, and you are correct should not use a data parallelism library for effectful actions whose order matters. But of course the same concerns can arise when using goroutines.

In contrast, Rust actually does guarantee an absence of data races, in rayon or in any other concurrency or parallelism library, whereas Go provides no such guarantee.


I know it's not the same, but in Go you can compile with `-race` and it'll catch race conditions at run-time (useful for testing/debugging).


I think that data parallelism is a very much wrong tool if you are interacting with external APIs – that's in the realm of concurrency and asynchrony. (Check out the work on Futures-rs and Tokio, they are inspiring projects!)

Rayon requires the callbacks it calls to be `Sync`. That means that they are declared to be safe to run across threads. That means that the API it provides explicitly requires the calculations to be parallelizable (and this is compiler checked), and if it doesn't run them in parallel, that can't hurt, right?


Rust can't stop random libraries from providing abstractions like this, but it does allow such libraries to be written to absolutely minimise overhead and maximise performance for when people opt in to using it. If this sort of parallelism isn't appropriate for the task at hand, the language isn't opinionated about what other schemes can work just as well as rayon.


> This is exactly the opposite what I understood from the article that Rayon does

You are aware of it. You put the par_iter() call there. Rust won't magically parallelize regular .iter() loops.

If you don't want to reason about parallelism, don't use Rayon. That's pretty explicit.


Are there really a lot of "shared use cases" between Rust and Go?

I realize that Go's creators intended for it to be a systems language. In practice though, it has found its niche in web apps or microservices, and command-line apps in the DevOps world. Rust is primarily aimed toward real-time applications that can't tolerate garbage collection latency.

Of course they're both general-purpose languages in theory. But in the real world, one is really competing with Java and Python while the other is competing with C++. I don't even see Go and Rust as head-to-head competitors at all... and I definitely don't understand their uni-directional "feud" (i.e. nearly every Rust thread is has people taking shots at Go, yet most Go threads don't mention Rust at all).


It's a unidirectional feud because Rust isn't limiting itself to C++ use cases. As the Async and other library infrastructure falls into place, why not use Rust for webapps, microservices, and command line tools? Just because it is suited for 0-overhead, high safety, with high level abstractions does not mean it's unsuited for these other things; or at least we don't know yet exactly.

On the other hand, Go is aware of its limitations, and thus has no need to fire any shots back, so to speak.


It will be interesting to see if Rust can prove competitive in the webapp/microservice space over time.

However, the standard library and surrounding ecosystem for all the other players "fell into place" years ago. Java and Python have robust and well-maintained (sorry Node!) libraries and drivers for everything you could imagine. The biggest draw for Go is probably that its standard library is so complete, you seldom need many outside dependencies at all. Other rising stars such as Elixir are basically web-first from the start, rather than hoping to grow into it later.

From what little tinkering I've done with Rust... it seems to have a much steeper learning curve than other languages, and an ecosystem with few database drivers and only a couple of half-baked web frameworks (http://www.arewewebyet.org). I don't mean that disrespectfully, since it clearly has generated a lot of excitement in other niches.

However, I personally don't really care about those other niches. We web folks are a much larger community, and for the most part we don't really care whether or not a language uses garbage collection (web apps are more likely to be I/O-bound rather than CPU-bound). So while I would love for Rust to become another serious option, it's basically optimized in the wrong direction for the web mass market... and lags years behind in the ecosystem support that they care more about.


I (and the rest of the team) don't really see it as a feud at all. There's always room for more languages, and we're fundamentally pluralists. There is no reason at all that Rust and Go can't or shouldn't coexist just fine.


>Rust is primarily aimed toward real-time applications that can't tolerate garbage collection latency.

Why can't Firefox tolerate garbage collection latency?


Rendering webpages is a very intense task. Many of the improvements Servo is working on are basically "how can we transpose AAA video game technologies and techniques into web page rendering." The problems are more similar than you'd imagine.


> In practice though, it has found its niche in web apps or microservices, and command-line apps in the DevOps world.

I don't know about microservices but I think it could compete for command-line devops apps.


> "nearly every Rust thread is has people taking shots at Go"

Often resulting from "this is so much more complicated than Go, why bother?" prodding as seen in lukaslalinsky's post above, at which point those "shots" are just asked-for information.


What do you mean by "turning into"? I don't see much special syntax in the post apart from the general expression-oriented functional style, including higher-order functions. Which has always been part of Rust. If that means turning into Scala, then almost every high-level programming language is currently turning into Scala.


> If that means turning into Scala, then almost every high-level programming language is currently turning into Scala.

Martin Odersky, the creator of Scala, has said as much. Just look at Kotlin, Swift, and to some degree OCaml (upcoming modular implicits) and C# 7. Even Java 9 adopts Scala syntax with underscore now reserved.

Rust is its own thing, however. OP is probably referring to complexity of syntax that more powerful languages present to new comers.


I believe the underscore predates Scala, with roots in ML.


From the starting point of being a professional Java developer and a hobbyist Haskeller, Scala should've been the perfect language for me, yet I struggle with making it past just how complicated the language is -- not really complex, as such, just over-engineered.

To me, Rust feels like the simplest possible language that could achieve their design goals, whereas Go feels like they went so far out of their way to make it simple that they lost track of the other goals altogether.


Because I'm one of the only people passionate about Rust at my job (and there are plenty who love Go), I'm often asked questions about Rust vs. Go, which actually perplexes me a little. Asking "What about Go?" when someone brings up Rust is essentially like saying "What about OCaml?"; the languages have a few similar features (as most languages do, IMO), but they have completely different goals and only a subset of shared use-cases (and again, you'll likely find at least some shared use-cases if you pick any two languages). I feel like constantly bringing up Rust and Go together confuses things more than it helps, since they really aren't that similar.


I experience the same thing all the time. Go has become the new node.js for ineffectual, out-of-touch technical managers and mediocre engineers to just randomly throw out as their "What about _____?" hobby horse.

I got myself out of the position of contrasting it against whatever they were railing against (in this case Erlang), and instead just put the question back on them. "I don't know, you tell me why I should consider Go for this."

Usually shuts down the conversation because they don't actually know anything about Go. Just that they've heard other people talk about it.


I think people compare them because they were both designed as alternatives to C++ (or at least with that in mind). What people might miss is that they were solving two very different problems: Rust aims to be as fast (or faster) while guaranteeing memory safety, while Go aims to be easy to understand and get productive as quickly as possible.


> I think people compare them because they were both designed as alternatives to C++ (or at least with that in mind).

I find it really interesting when people say this, because to me, Go only seems like a suitable replacement for higher-level C++, whereas Rust could conceivably be used in any domain where C++ would be suitable. I'm not sure if this is the general consensus or just my perception though, so this could be clouded by my bias of being a heavy Rust user.


If one were to have experience in both Rust and Scala, they would see that the comparison between the two is flawed. Scala revels in implicitness, whereas Rust is very explicit. Scala also has expressiveness as a goal, whereas in Rust expressiveness is an anti-goal. There are no user-defined operators in Rust, and less magic than in any language that I've used other than C.


> whereas in Rust expressiveness is an anti-goal.

I think this is seriously over-stating it. We don't want Rust to be ugly or hard to use.


Yes this is true. I do wish I could have custom operators every now and then though, but that's a whole other bag of complexity and can be ugly if you don't know what each operator means.


What's described here is precisely how Rust and Erlang became my two favorite languages to work with.


I have left more comments on this thread than I should considering my lack of experience with Rust, but I think the comparison with Scala is valid.

What got me thinking about that was the implementation of into_par_iter, which is a trait implemented on a bunch of standard types, not unlike implicit methods in Scala. I know this is rare code, but reading this left with very similar feeling that I had when reading Scala standard library code:

https://github.com/nikomatsakis/rayon/blob/master/src/par_it...


Being able to extend other types with methods is a pretty common feature in many languages, many of which are much less magic than Scala. It's very common in functional languages with objects.


How would you write rayon in Go?

Can you describe the simpler signature of par_iter() that would work in Go?

Can you describe the Go features that allow the compiler to prevent data races at compile time?


The answer to the questions is that you wouldn't do that in Go. Go doesn't have features that would allow you to do something like this efficiently. You are pretty much limited by the language. You can do high-level abstractions, not something low-level like this and I think that's where the simplicity comes from.


What would that abstraction look like? I really can't think of a more high-level abstraction of parallelism than a single method call saying "Please parallelize this sequential algorithm. But only when it makes sense."


How does not being able to write the high-level par_iter() abstraction make Golang more high-level?


I didn't say Go is more high-level, I said Go doesn't allow you to do low-level things efficiently. Writing your own parallelism library requires good access to the language's execution model internals. Go doesn't give you that, because Go wants you to write application code, not library code. It's often frustrating, because sometimes you want to write a generic low-level library that will add no overhead to the code that uses is, but on the other hand it's very liberating, because you know what you have available and you instead focus on your application code.


It would be accurate to say that Go gives you neither low level control nor high level abstraction, whereas Rust gives you both.


None of the primitives Golang gives you allow you to achieve the same results that Rayon does. The Go work stealing system only works with goroutines, which are too expensive to spawn on every iteration of a tight loop.


My main problem with rust at this point is the noise caused by a lot of the wrangling of standard types.

I know it's a low-ish level language that leaves a lot explicit for the developer to type, and that a lot of boilerplate things could yet be turned into conventions or macros much like try!, as the language matures.

I'd really like to see this process accelerate as a lot of Rust code now is littered with into(), unwrap() etc. which causes mental overhead for the reader.

I'd like to see more "zero cost abstractions" where the zero also includes zero characters typed that aren't part of the problem itself. As a benchmark, I'd like to see "noise parity" with go/ocaml/swift while still having the nice safety/performance Rust has today.


Rayon seems very similar to Java 8 Parallel Streams or C# Parallel LinQ. What are the advantages/disadvantages of the Rust approach?


Closures in Rust are stack-allocated and LLVM can inline them and optimize them as if they're any regular imperative code, for one thing. This means that there's less overhead from managing the iterator chains, and that they're statically dispatched which saves on runtime indirection. The borrow checker also makes sure that you don't accidentally mutate non-thread-safe data from your parallel iterators.


Regarding closures, that is also possible in Java and .NET, just you don't control when it might happen.


In Rust, each closure has a unique type, and derived expressions are templated on that type. This is key to making them statically dispatched, which is important for making the base case (sequential) fast.


The compiler guarantees no data races.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: