Hacker Newsnew | past | comments | ask | show | jobs | submit | calebwin's commentslogin

I've been working on a project with similar goal to RLSL, called Emu. It's a procedural macro that automatically off-loads portions of code to run on a GPU. Here's a quick example...

  #[gpu_use]
  fn main() {
      let mut a = vec![0.2; 1000];
      let mut b = vec![0.3; 1000];
      let c = vec![0.0; 1000];

      gpu_do!(load(a));
      gpu_do!(load(b));
      gpu_do!(load(c));
      gpu_do!(launch());
      for i in 0..1000 {
          c[i] = a[i] + b[i];
      }
      gpu_do!(read(c));
      
      println!("{:?}", c);
  }
Emu is currently open-sourced as a 100% stable Rust library and while it only supports a tiny subset of Rust, that subset is well-defined with useful compile-time errors and a comprehensive test suite.

So if you're looking for a way to contribute to single-source GPGPU for Rust, please consider helping expand Emu's supported subset of Rust. The repository is at https://www.github.com/calebwin/emu

I will say that since Emu works at the AST/syntax level, RLSL is of great interest to me because it works instead at the MIR level which allows it to more easily support a large subset of Rust.


- So RLSL can work with Emu?

- Would it mean most of the general Rust code could be made to work on GPU? Or is it you want Emu to work at MIR level?

- Do you plan to actually try to do it?

Emu seems like a really cool project either way. :)


- Maybe there is a component of RLSL that could be useful. I have to think more about what I want that component to be.

- I want Emu to support general Rust code but still use stable Rust and provide really nice compile-time errors. Maybe Emu could do AST-level checking to (1) ensure that only legal transpilable-to-SPIR-V subset is used, (2) infer the kernel parameters, (3) infer global work size, local work size and then do MIR-level compilation to OpenCL or SPIR-V?

- At the moment, I want to focus on AST-level compilation because I think many applications (AI, ML, simulations, etc.) can still technically be implemented without a huge subset of Rust.


I was planning to write a tiny SVM in Rust just as a plaything, so I would probably use Emu to see if I can speed it up....

Does Emu have some getting started other than docs?


https://docs.rs/em contains not only documentation but also comprehensive explanation on effectively using Emu.

I would recommend looking through it first. Of course, if you have questions feel free to ask - https://gitter.im/talk-about-emu/thoughts


Since you've talked about single-source GPGPU for Rust, I should point out that I'm working towards a pre-RFC to add SYCL-like multi-target for Rust.


Would you mind elaborating on the "compile time reference counting"? Is this new? How does it handle cycles?


It is so new I haven't even documented it yet.. hence why I said the timing for this HN post isn't great.

So imagine the current implementation does a lifetime analysis somewhat similar to Rust, but then where Rust would error out, this implementation simply inserts a runtime refc increase. This gets rid of most runtime refc overhead without the programmer needing to be clever about it.

Then, I intend to add an optional type annotation that gets you the Rust behavior, for those cases where you want maximum control. I don't think its the right default though.

The algorithm is more lenient than Rust, since I don't have any shared memory concurrency for example.

I intend to write up more thorough details about the algorithm, but again, haven't gotten to it.

It does not handle cycles. If you create cycles that you do not manually break, you'll get a cycle report at program exit that prints the kind of values that weren't reclaimed. You're then expected to fix your code accordingly :)


This at first glance actually looks similar to http://liu.diva-portal.org/smash/get/diva2:20899/FULLTEXT01....

And this style of static analysis has been taken even further by Luc Blaeser: http://concurrency.ch/Content/publications/Blaeser_Component...

The cycle checker I'll have to have a look at - I'd like to crib parts of it for an OS I'm trying to cobble together. Your project looks very interesting, thanks!


Thanks for those, I hadn't seen them. The Ritzau paper indeed appears to be doing something very similar, where in 5.2 he describes "inner references" whose lifetime is bounded by the outer one, I call that "borrowing", where the outer definition (the owner) gets locked from mutation during the borrow.

What I do is a bit more involved though, as for every language construct I indicate if it wants to own or borrow its children (and wether it wants the parent to own or borrow), to remove maximum refcounts, also for cases which are not simply variables. Again, I will describe this in detail.

The second paper I don't see how it relates to what I am doing.



That sounds more similar to Swift's approach, where the compiler does aggressive optimisation to elide reference counts were possible. Do you agree?


Yes, it is more similar to Swift than Rust. I guess people are more familiar with Rust.. since Swift hasn't actually implemented it yet?

https://github.com/apple/swift/blob/master/docs/OwnershipMan...


The compiler has implemented a lot of the infrastructure for reasoning about ownership, and has had optimisation for eliding reference counting operations for a long time (but the ownership stuff makes them simpler).


> I don't think its the right default though.

I think Rust should allow this kind of relaxed ownership, with ARC behind the scenes, as lots of user level code should not require fussing over ownership details. A "get shit done" mode.


Rust supports proc macros, you know-- you can do literally anything with your code. So what you describe is already possible! Make a good proof of concept and submit a proper RFC for it, and it might even be adopted as part of Stable Rust!


Yes, that is exactly my intention, for the majority of code this compile time reference counting is good enough.

You can of course use Rc types in Rust, but that is verbose, and throws all ownership out the window, unlike Lobster.


Any chance you could you share some references on this kind of type inference? It sounds very interesting!


It is not an implementation of an existing paper or anything.. I am going to write it up soon, promise!

Feel free to email or msg me if you want to be notified.



More on the compile time reference counting here: http://aardappel.github.io/lobster/memory_management.html


A comparison to Swift's ARC would also be of interest :)



Yup, I will write up the algorithm some time soon.

I've seen Swift's intentions to add lifetime analysis to ARC, but from what I could see it is much more complicated in Swift, requires annotations, and with significant runtime overhead remaining.


Arc isn’t that stuff, Arc is what it already does today. “Automatic reference counting.”


I'm aware of that, I said "add lifetime analysis to ARC", as in, ARC is what I call regular runtime reference counting, and with lifetime analysis some of it can move to compile time.


Yep oops, sorry.

Anyway, psyched to see more people experimenting with this kind of thing!


Yes, I frankly feel this design space is the future of memory management. GC, begone!


http://concurrency.ch/Content/publications/Blaeser_Component...

This seems to scale out very well - it's kind of static GC analysis with RAII and erlang-style messaging.


I really like that graphic for the website and README! Did you make it?


Thanks! It was a friend of mine. You can see more of his work at https://www.instagram.com/sutocreation


Would you mind explaining why compiling to machine code was faster than to C?


Isn't the main reason why languages like Go and Rust don't do that because it would produce a ton of memory garbage?


It will be optimized by the compiler.


Can it really always be optimised by the compiler. For example, I imagine optimising `sort(&arr)` which cannot mutate arr could be quite difficult, no?


What I mean is `a = sort(a)` will be optimized to `sort(&a)` which can mutate internally.


What about:

  x := &a
  a = sort(a)
  print(x)  // Sorted?
To detect whether something can be mutated in place will require static analysis to see if there are aliases or pointers to the data. If this is an optimization that's based on whether something is safe to mutate in-place, you'll run into the problem where performance becomes different depending on whether something can be optimized or not. For example, adding "x" makes the sort call suddenly perform worse since the compiler sees that it can't mutate "a" in-place.

This is assuming that you allow multiple aliases to the same data. The reason Rust has lifetimes and borrowing is precisely to be safe about mutation. Rust wouldn't allow sort() to modify "a" in-place in the above code.


Unless `a` is a linear value, somebody might have a reference to it, so you can't just sort it in place under the cover. The entire thing is useless if you looks like you don't have side-effects but the compiler visibly breaks this.

And you probably want to pick one of sorting in-place and out-of-place.


Yeah, as a rule of thumb I'd say "If Haskell doesn't already do this optimization, find out why."

I say "rule of thumb" and I mean it that way. Sometimes there will be Haskell-specific answers. But if your programming language has less code metadata available than Haskell but is promising more optimizations, it's worth a cross-check. I agree with you strongly in this particular case; without type system support for this specific case, it's going to be very hard to optimize away things like a sort. You start getting into the world of supercompilation and whole-program optimization, the primary problems of which for both of them is that they tend to have intractable O(...) performances... but something about them makes them seem intuitively easy. I've seen a number of people fall into that trap.

(I haven't personally, but I understand the appeal. My intuition says they shouldn't be that difficult too! But the evidence clearly says it is. Very, very clearly. We're talking about things like optimizations that are super-exponential in complexity, but seem intuitively easy.)


I assume `a` would need to be copied into the local scope of the function and the optimization would be to elide the copy after analysis shows the original a is safely consumed by the call site so it does not require persistence.

This probably means lots of aliasing restrictions or in the case where the optimization can't be done, copying could be an expensive side effect of an innocent refactoring.

I hear Swift uses something like this, though it's copy-on-write. I've not used Swift in any significant capacity. Does anyone else have experiences to share with this model?


I don't think copy-on-write will prevent a copy here, since the copy is being written to inside of sorted. I don't think the compiler is smart enough to elide the copy, either.


It definitely can and does work for similar situations in other languages. It’s fragile though as aliasing accidentally somewhere or adding other kinds of uncertainty around which values are returned makes sinking the return allocation into the caller, a required prerequisite, much more likely to fail.

A good way to imagine this is having return value optimizations give you a copy which is placed on the original which allows the work to be skipped. But that can require a whole lot of other trades around calling conventions, optimization boundaries and so on. C++ has dealt with some of this complexity recently but it’s nuances too years to sort out between standard revisions and only became required in some cases rather than legal until after compilers had plenty of time to work on it.


Yeah, I don't doubt it if Clang can do this optimization for C++, but I don't think the Swift optimizer is quite there yet since it needs to peer through many more layers of complexity.


How? In-place sorts and out-of-place sorts have different implementations.


there's this language called gro that i'm hoping to look into when i get some time - http://depts.washington.edu/soslab/gro/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: