As someone looking at this influx of discussion from the point of view of a curi...

keldaris · on Dec 30, 2016

Apparently, the parent post is now too old to edit, so I'll post a correction here. In light of helpful input, I was wrong about the following:

1) Tail call optimizations are fine, the documentation is just a bit ambiguous on the matter.

2) Explicit vectorization is available in nightly.

3) The equivalent of -ffast-math is technically available, though very inconvenient to use. There may be workarounds, I'm not sure.

These points, coupled with the ability to do performant threading (in theory, even with the syntax I'd prefer), go a long way to alleviating some of my performance concerns. Well written (nightly) Rust may be closer to C++ in numerical performance than I initially thought. I'd like for some of these things to be much more convenient to use than they are currently, but the opportunities are there.

dbaupp · on Dec 30, 2016

Memory safety is important for everything because it is a prerequisite for any other form of correctness. There's no guarantee that a violation of memory safety will result in a crash, memory corruption is just as possible, resulting in a bad numerical computation or a broken game. It may be that the risk of an actual problematic memory safety violation in numerics/gaming is small enough to not worry, but it is still something to consider.

The rayon library offers similar functionality to OpenMP, including a parallel map/reduce (etc) over a vector, all in safe code for the user.

I believe a operations that allow -ffast-math style optimisations were recently added to the floating-point types, allowing one to specify individual places where reassociation (etc) is OK. This obviously isn't as automatic as -ffast-math, but usually one has only a few small kernels where such things are relevant anyway.

Lastly, two smaller points:

- C++ doesn't do tail call optimisation just as much as Rust doesn't do it. Compilers for both can (and do) perform TCO, the languages just don't guarantee it.

- C++ doesn't do explicit vectorization either, not in the standard. If you're willing to move into vendor extensions then nightly Rust seems somewhat equivalent and does allow for explicit SIMD.

keldaris · on Dec 30, 2016

> Memory safety is important for everything because it is a prerequisite for any other form of correctness.

That's obviously true, but not what I (or the OP) was talking about. My point was that in many applications memory safety doesn't rank highly as a separate concern, in addition to computing correct output from expected input. Because of this, describing Rust as a language that solely focuses on memory safety isn't very interesting to large groups of developers that work on such applications.

> The rayon library offers similar functionality to OpenMP, including a parallel map/reduce (etc) over a vector, all in safe code for the user.

I did look at rayon when I skimmed through the available ecosystem. The runtime cost of that approach wasn't obvious to me from the documentation, and it doesn't quite let me keep the loop-based control flow usually employed for numerical calculations (because of the need to refer to different indices within the loop, etc.), but it's certainly a viable approach. Not a direct replacement to OpenMP loops, though.

On the subject of -ffast-math, I did not encounter these recent additions you mention in the language documentation, but I'll take another look on the issue tracker and elsewhere. Thanks for the information.

On tail call optimisations, I don't believe your statement is entirely correct. It's true that C++ compilers don't guarantee TCO (although, empirically they're very good at it), but Rust doesn't seem to be able to do it at all. It's explicitly stated in the language documentation and there's a recent issue on the subject [1].

And on explicit vectorization - I'll take a look at the latest nightly Rust and edit my post accordingly if explicit SIMD is already usable. Glad to hear it.

FWIW, I think Rust has made great progress considering the age of the language, and I'm glad to see SIMD and other improvements being implemented. My objection was simply against the hyperbolic assertion that Rust has already attained full parity or even superiority over C++ in performance.

By the way, is there a way to turn off runtime bounds checking for vectors? That's another common performance sink in numeric computing.

[1] https://github.com/rust-lang/rust/issues/217

dbaupp · on Dec 30, 2016

> My point was that in many applications memory safety doesn't rank highly as a separate concern, in addition to computing correct output from expected input

Indeed, I was trying to cover that in my comment. I agree that it isn't an explicit selling point to such people, but I think that it should be:

- numerics/scientific computing/machine learning are slowly taking over the world. It is bad to have random/occasional heisenbugs in systems that influence decisions from the personal to the international.

- games are very, very often touching the network these days, and thus are at risk of being exploited by a malicious attacker.

Of course, people in those domains aren't necessarily thinking in those terms/have deadlines to hit/are happy with their current tooling.

> The runtime cost of that approach wasn't obvious to me from the documentation

It is low. I've even heard rumours that the core primitive has the lowest overhead of all similar data parallel constructs, including, say, Cilk. This, combined with aggressive use of "expression templates" (with a special mention to Rust's easily inlinable closures), means I'd be surprised if rayon was noticably slower than OpenMP for the straight-forward map/associative-reduce situations. More exotic transformations are more dubious, given rayon has had far less person-hours put into it.

> it doesn't quite let me keep the loop-based control flow usually employed for numerical calculations (because of the need to refer to different indices within the loop, etc.), but it's certainly a viable approach

I'm not sure loop-based control flow is actually necessary, since (I believe) one can, say, parallelise over an enumerated iterator (e.g. slice.iter().enumerate()), which contains the indices. One can then .map() and read from the appropriate indices as required.

> On the subject of -ffast-math, I did not encounter these recent additions you mention in the language documentation, but I'll take another look on the issue tracker and elsewhere. Thanks for the information.

To shortcut your search: https://doc.rust-lang.org/std/?search=fast . (I apologise that I didn't link it earlier, I was on a phone.)

> On tail call optimisations, I don't believe your statement is entirely correct. It's true that C++ compilers don't guarantee TCO (although, empirically they're very good at it), but Rust doesn't seem to be able to do it at all. It's explicitly stated in the language documentation and there's a recent issue on the subject [1].

I guarantee that rustc can do tail TCO. I've spent a lot of time digging around in its output. The compiler uses LLVM as a back-end, exactly the same as a clang, and things like function calls look the same in C++ and in Rust.

That issue is 5 years old, and closed, and is (implicitly) "teach rust to have away to guarantee TCO", see the mention of 'be' vs. 'ret': they're keywords, 'be' theoretically being used like `be foo(1, 2)` and meaning "this call must be TCO'd" (i.e. my stack frame must be foo's stack frame).

Lastly, if you're talking about the documentation being [0], I think you're misreading it, in particular it says:

> Tail-call optimization may be done in limited circumstances, but is not guaranteed

That said, it is reasonable that you're misreading it, given "Not generally" is being technically correct ("rustc cannot do TCO in complete generality", i.e. there exists at least one tail-call which won't be optimised) in a way that is confusing in normal English. I personally think it would be better if it started with "Yes, but it is not a guarantee" rather than "No".

[0]: https://www.rust-lang.org/en-US/faq.html#does-rust-do-tail-c...

> By the way, is there a way to turn off runtime bounds checking for vectors? That's another common performance sink in numeric computing.

Yes, the get_unchecked and get_unchecked_mut methods. This takes the same approach as the -ffast-math equivalents: disable when required, rather than sacrifice reliability across a whole program. That said, Rust's iterators (which also power rayon) are more idiomatic than manual indexing, when they work, and generally avoid unnecessary bounds checks more reliably.

kibwen · on Dec 30, 2016

Thanks for this great comment thread, Huon. :) I didn't know we had fastmath stuff now!

keldaris · on Dec 30, 2016

Thanks for the insightful post. I stand corrected on the TCO issue, I had indeed misread the documentation, and I've posted a correction to my original post accordingly. The bounds checking issue is also resolved to my satisfaction.

> I'm not sure loop-based control flow is actually necessary, since (I believe) one can, say, parallelise over an enumerated iterator (e.g. slice.iter().enumerate()), which contains the indices. One can then .map() and read from the appropriate indices as required.

The loop-based control flow is, strictly speaking, never necessary, as the two formulations are mathematically equivalent. It's just that for many algorithms the loop-based approach is more intuitive (to many people) and readable, and has less boilerplate. Your simple_parallel library looks syntactically closer to what I'd like, though I'm not sure if it's still being maintained.

> To shortcut your search: https://doc.rust-lang.org/std/?search=fast . (I apologise that I didn't link it earlier, I was on a phone.)

Thanks. I'm glad the option is there, but the current implementation looks quite tedious to use in long numeric expressions, and would greatly sacrifice readability. Ideally, I'd like something along the lines of fastmath { <expression> } blocks or function / loop level annotations. Is something like that possible with Rust's metaprogramming, perhaps?

> - numerics/scientific computing/machine learning are slowly taking over the world. It is bad to have random/occasional heisenbugs in systems that influence decisions from the personal to the international.

That's theoretically true, but (in my opinion) practically irrelevant. The thing about numerical kernels is that input is constrained by the mathematics involved and the output is rigorously verifiable. In practice, I can't imagine a realistic case where a memory safety error would not be caught by the usual verification tests that any numeric code of any importance is routinely subject to. That's why programmers in this domain don't normally think about memory safety as a separate issue at all, it's just a small and not particularly remarkable part of the normal correctness testing. A guarantee of memory safety still doesn't free you from having to do all those tests anyway. Obviously, this is much different for systems software that takes inherently unpredictable user input that's difficult to sanitize.

> - games are very, very often touching the network these days, and thus are at risk of being exploited by a malicious attacker.

The argument here is, I think, much stronger than for numerical software. Nevertheless, even for online games (that are still only a subset of computer games), network data is comparatively easy to sanitize and memory safety issues wouldn't typically lead to exploitable attacks. I've never heard of a game used as a vector for a serious attack in any context, but at least it's somewhat conceivable in theory.