I've been working on a project with similar goal to RLSL, called Emu. It's a procedural macro that automatically off-loads portions of code to run on a GPU. Here's a quick example...
#[gpu_use]
fn main() {
let mut a = vec![0.2; 1000];
let mut b = vec![0.3; 1000];
let c = vec![0.0; 1000];
gpu_do!(load(a));
gpu_do!(load(b));
gpu_do!(load(c));
gpu_do!(launch());
for i in 0..1000 {
c[i] = a[i] + b[i];
}
gpu_do!(read(c));
println!("{:?}", c);
}
Emu is currently open-sourced as a 100% stable Rust library and while it only supports a tiny subset of Rust, that subset is well-defined with useful compile-time errors and a comprehensive test suite.
So if you're looking for a way to contribute to single-source GPGPU for Rust, please consider helping expand Emu's supported subset of Rust. The repository is at https://www.github.com/calebwin/emu
I will say that since Emu works at the AST/syntax level, RLSL is of great interest to me because it works instead at the MIR level which allows it to more easily support a large subset of Rust.
- Maybe there is a component of RLSL that could be useful. I have to think more about what I want that component to be.
- I want Emu to support general Rust code but still use stable Rust and provide really nice compile-time errors. Maybe Emu could do AST-level checking to (1) ensure that only legal transpilable-to-SPIR-V subset is used, (2) infer the kernel parameters, (3) infer global work size, local work size and then do MIR-level compilation to OpenCL or SPIR-V?
- At the moment, I want to focus on AST-level compilation because I think many applications (AI, ML, simulations, etc.) can still technically be implemented without a huge subset of Rust.
It is so new I haven't even documented it yet.. hence why I said the timing for this HN post isn't great.
So imagine the current implementation does a lifetime analysis somewhat similar to Rust, but then where Rust would error out, this implementation simply inserts a runtime refc increase. This gets rid of most runtime refc overhead without the programmer needing to be clever about it.
Then, I intend to add an optional type annotation that gets you the Rust behavior, for those cases where you want maximum control. I don't think its the right default though.
The algorithm is more lenient than Rust, since I don't have any shared memory concurrency for example.
I intend to write up more thorough details about the algorithm, but again, haven't gotten to it.
It does not handle cycles. If you create cycles that you do not manually break, you'll get a cycle report at program exit that prints the kind of values that weren't reclaimed. You're then expected to fix your code accordingly :)
The cycle checker I'll have to have a look at - I'd like to crib parts of it for an OS I'm trying to cobble together. Your project looks very interesting, thanks!
Thanks for those, I hadn't seen them. The Ritzau paper indeed appears to be doing something very similar, where in 5.2 he describes "inner references" whose lifetime is bounded by the outer one, I call that "borrowing", where the outer definition (the owner) gets locked from mutation during the borrow.
What I do is a bit more involved though, as for every language construct I indicate if it wants to own or borrow its children (and wether it wants the parent to own or borrow), to remove maximum refcounts, also for cases which are not simply variables. Again, I will describe this in detail.
The second paper I don't see how it relates to what I am doing.
The compiler has implemented a lot of the infrastructure for reasoning about ownership, and has had optimisation for eliding reference counting operations for a long time (but the ownership stuff makes them simpler).
I think Rust should allow this kind of relaxed ownership, with ARC behind the scenes, as lots of user level code should not require fussing over ownership details. A "get shit done" mode.
Rust supports proc macros, you know-- you can do literally anything with your code. So what you describe is already possible! Make a good proof of concept and submit a proper RFC for it, and it might even be adopted as part of Stable Rust!
Yup, I will write up the algorithm some time soon.
I've seen Swift's intentions to add lifetime analysis to ARC, but from what I could see it is much more complicated in Swift, requires annotations, and with significant runtime overhead remaining.
I'm aware of that, I said "add lifetime analysis to ARC", as in, ARC is what I call regular runtime reference counting, and with lifetime analysis some of it can move to compile time.
Can it really always be optimised by the compiler. For example, I imagine optimising `sort(&arr)` which cannot mutate arr could be quite difficult, no?
To detect whether something can be mutated in place will require static analysis to see if there are aliases or pointers to the data. If this is an optimization that's based on whether something is safe to mutate in-place, you'll run into the problem where performance becomes different depending on whether something can be optimized or not. For example, adding "x" makes the sort call suddenly perform worse since the compiler sees that it can't mutate "a" in-place.
This is assuming that you allow multiple aliases to the same data. The reason Rust has lifetimes and borrowing is precisely to be safe about mutation. Rust wouldn't allow sort() to modify "a" in-place in the above code.
Unless `a` is a linear value, somebody might have a reference to it, so you can't just sort it in place under the cover. The entire thing is useless if you looks like you don't have side-effects but the compiler visibly breaks this.
And you probably want to pick one of sorting in-place and out-of-place.
Yeah, as a rule of thumb I'd say "If Haskell doesn't already do this optimization, find out why."
I say "rule of thumb" and I mean it that way. Sometimes there will be Haskell-specific answers. But if your programming language has less code metadata available than Haskell but is promising more optimizations, it's worth a cross-check. I agree with you strongly in this particular case; without type system support for this specific case, it's going to be very hard to optimize away things like a sort. You start getting into the world of supercompilation and whole-program optimization, the primary problems of which for both of them is that they tend to have intractable O(...) performances... but something about them makes them seem intuitively easy. I've seen a number of people fall into that trap.
(I haven't personally, but I understand the appeal. My intuition says they shouldn't be that difficult too! But the evidence clearly says it is. Very, very clearly. We're talking about things like optimizations that are super-exponential in complexity, but seem intuitively easy.)
I assume `a` would need to be copied into the local scope of the function and the optimization would be to elide the copy after analysis shows the original a is safely consumed by the call site so it does not require persistence.
This probably means lots of aliasing restrictions or in the case where the optimization can't be done, copying could be an expensive side effect of an innocent refactoring.
I hear Swift uses something like this, though it's copy-on-write. I've not used Swift in any significant capacity. Does anyone else have experiences to share with this model?
I don't think copy-on-write will prevent a copy here, since the copy is being written to inside of sorted. I don't think the compiler is smart enough to elide the copy, either.
It definitely can and does work for similar situations in other languages. It’s fragile though as aliasing accidentally somewhere or adding other kinds of uncertainty around which values are returned makes sinking the return allocation into the caller, a required prerequisite, much more likely to fail.
A good way to imagine this is having return value optimizations give you a copy which is placed on the original which allows the work to be skipped. But that can require a whole lot of other trades around calling conventions, optimization boundaries and so on. C++ has dealt with some of this complexity recently but it’s nuances too years to sort out between standard revisions and only became required in some cases rather than legal until after compilers had plenty of time to work on it.
Yeah, I don't doubt it if Clang can do this optimization for C++, but I don't think the Swift optimizer is quite there yet since it needs to peer through many more layers of complexity.
So if you're looking for a way to contribute to single-source GPGPU for Rust, please consider helping expand Emu's supported subset of Rust. The repository is at https://www.github.com/calebwin/emu
I will say that since Emu works at the AST/syntax level, RLSL is of great interest to me because it works instead at the MIR level which allows it to more easily support a large subset of Rust.