Hacker News new | past | comments | ask | show | jobs | submit login
Making a RISC-V Operating System Using Rust (utk.edu)
300 points by fogus on Oct 2, 2019 | hide | past | favorite | 46 comments



I've been pushing the author to write this up for a year now (we both work at University of Tennessee). He teaches a lot of OS-oriented courses and makes them for fun so he is perfect to write this.

Next I'm trying to convince him to make an OS in Zig for Raspberry Pi with a focus on graphics. If you have any suggestions for him, shoot him an email!


What would a "focus on graphics" mean? A GUI and windowing system, or just graphics processing?


I want it to be a video game console :) I think our students would have a lot of fun with that.


If you do, definitely ping flohofwoe, who a) maintains a Zig port of his lightweight Sokol graphics API and b) has a penchant for retro consoles.

https://github.com/floooh/sokol-zig


The zig wrapper is only a minimal experiment and work in progress now, also I've hit a little bug :)

https://github.com/ziglang/zig/issues/3211

BUT the sokol_app.h and sokol_gfx.h headers require an underlying platform 3D-API (e.g. GL, D3D11 or Metal) and window system to setup the swap chain. On a bare metal machine with only a 2D framebuffer they're not all that useful unfortunately.


Ah yes, I didn't really think that one through. I know Redox OS (Rust) has OpenGL working, but I think they're just using the Mesa software renderer.


On a related note, MIT seems to have ported xv6 to riscv.

https://github.com/mit-pdos/xv6-riscv


Wow! What an awesome tutorial/book. I look forward to reading more chapters. This is already an extremely useful resource since it covers cross compilation from an x86 system. I love to play around with this kind of stuff in my free time but connecting to dev boards and constantly shifting paradigms was annoying enough to limit my enthusiasm. It took a lot of research to find out how to do what the author fits in a single chapter. Thank you for this awesome post!


Looks similar to my effort to interface RISC-V emulators and bare metal C: https://github.com/Lichtso/riscv-llvm-templates


I took Programing Languages from him, as well as TA'd for his Operating Systems class. He is an amazing lecturer, and I am glad he is releasing something like this to the public! Definitely something worthwhile to read. Looking forward to more content being released.


Rust is build around the expectation that allocations happen automatically and can never fail. I’m curious to see how they deal with this in a kernel...


Rust, the language, strictly speaking, knows nothing about allocation at all.

The standard library assumes infallible allocation currently. Kernels generally do not use the standard library.

(Okay, STRICTLY speaking there’s a hack for Box<T> in the language right now but it’s not about the allocation part and Box<T> is technically defined in the standard library with Magic(tm) and so isn’t relevant in an OS dev context, as you won’t use the standard library and therefore Box<T>. Rust-the-language never implicitly inserts heap allocations, including Box<T>, anywhere.)


The Rust standard library is. The Rust Core library (the far more minimal library that gets used when you mark a crate as #![no_std]) does not. That's how the other "OS in Rust" and "allocator in Rust" projects work.


Well, this OS in Rust project simply commits the same sins as the standard library by using a custom allocation routine that panics on allocation failure: https://os.phil-opp.com/heap-allocation/#allocations-in-rust

Saying that Rust the language doesn't require a non-failing allocator misses the point--the ergonomics of the language make dealing with allocation failure difficult; sufficiently difficult that none of the projects I've seen actually bother attempting it.

See https://cs.brown.edu/research/pubs/theses/ugrad/2015/light.a..., which explains idiomatic Rust instructs developers to return by value, relying on caller assignment to types like Box (which uses exchange_malloc under the hood), to handle heap allocation. Basically, the strategy for dynamic object management in Rust is predicated on hidden heap allocations.

So of course it's not necessary. But good luck writing an entire operating system otherwise. Even Redox OS doesn't bother trying to fight the language in this regard: https://gitlab.redox-os.org/redox-os/slab_allocator/blob/mas...


That's not quite right. Rust has two ways to allocate a Box.

One is the `box` keyword: this guarantees that the object will be constructed in place on the heap, but doesn't work with fallible allocators (or, in fact, anything but the default allocator). This is what the research paper you linked is talking about. However, all these years later, `box` has never been stabilized; nor will it be in its current form, because it's considered too inflexible. Whatever form of in-place construction does eventually get stabilized will likely support fallibility.

The other way to allocate a Box is `Box::new`. This is not compiler magic; it's simply a regular function, implemented in Rust, that calls the allocator and then moves (i.e. memcpys) an existing object into the new allocation. If you write your own Box-like type, there's nothing stopping you from making your `new` function fallible.

What about optimizations? Does `Box::new` get optimized in ways that a fallible version won't? Well, no. The compiler will inline `Box::new`, and if you call it with a fresh stack allocation as an argument, LLVM can theoretically, sometimes, elide the stack allocation and the memcpy altogether, instead initializing the object directly on the heap. Theoretically. The paper claims that it always does so, but the paper is wrong. [1] In fact, the compiler doesn't do so even in relatively easy cases. [2] It would be nice if LLVM did better here, but it doesn't seem to be a big source of overhead in Rust programs in practice. If LLVM did improve the optimization, it would probably work equally well for a fallible allocator as for `Box::new`, because they're equally complex from its perspective: `Box::new` can panic, and LLVM treats panics as branches.

(Box does have compiler magic for a different case: the ability to move out of it. Not being able to replicate that in a custom type is suboptimal, but not the end of the world.)

As for why those OS projects panic on allocation failure:

The phil-opp.com one implements the standard allocator interface in order to use the standard library container types. I think it sucks that Rust's standard containers don't support allocation failure, but you don't need to use them...

For Redox... I'm not actually sure what they're doing, but I think "fn oom" is a hook for users of the allocator to call if they want to panic on out-of-memory, not something that mandates panicking. At least, that's the behavior of `std::alloc::handle_alloc_error` [3], which was moved there from being a trait method on `GlobalAlloc` named `oom`. However, they're implementing a different `oom` on an old version of the `Alloc` (not `GlobalAlloc`) trait, which is unstable; that method was removed entirely well over a year ago, so I guess the code must be out of date.

[1] https://users.rust-lang.org/t/how-to-create-large-objects-di...

[2] https://play.rust-lang.org/?version=stable&mode=release&edit... (press "..." -> "ASM")

[3] https://doc.rust-lang.org/nightly/std/alloc/fn.handle_alloc_...


> the ergonomics of the language make dealing with allocation failure difficult; sufficiently difficult that none of the projects I've seen actually bother attempting it.

Honest question: what languages make this ergonomic and can you share any projects that handle this gracefully?


Off-hand I don't know of any that make this ergonomic, at least none that don't make use of exceptions. Rust isn't unique in this regard.

What makes it a potential impediment in Rust is that the constraints and burdens of the borrow checker are offset by mechanisms like Box. The fact that all the extant examples choose the convenience of Box over handling OOM, even in situations where not handling OOM is obviously a deal breaker for production systems, speaks volumes about the significance of the problem and how the language shapes people's choices. Async/await is in the same boat--technically doesn't require a non-failing heap allocation, but who's going to bother making it work? You technically don't need async/await to do asynchronous programming, either, but the whole point was that this is the type of thing that needs to be addressed by the core language with some primitive (i.e. generators) that does the heavy lifting and which can be built upon.

I don't know enough Rust to know how easy it would be to create a Box-like implementation that rewrites the AST to automatically propagate allocation failure via the idiomatic Result<T, E> protocol. But that seems roughly what the proper solution might look like in Rust; either that or finally getting over the anxiety and aversion about exceptions.

Lua handles OOM quite well. Lua doesn't have try/catch, just "protected calls" which are not as light-weight as regular function calls. (A pcall initializes a recovery point with _setjmp.) In Lua you tend to use protected calls at, effectively, transactional boundaries--the points in your call graph where you're willing and able to rollback application state for non-specific, otherwise unrecoverable errors. AFAIU you could technically do the same in Rust, except Rust makes unwinding optional at compile-time[1], so it's not the kind of thing people will make a habit of deliberately designing for in their libraries. Which makes me think the likely solution for Rust, if any, is to permit libraries to opt-in to OOM recovery with an allocator pragma that does something like the Result<T, E> propagation mentioned above.

[1] Lua is GC'd. The cost of running destructors outside the normal call/return protocol is fixed and independent of whether an application uses protected calls.


Writing a KBox that calls a kmalloc and where new() returns a Result<KBox<T>, KAllocFailure> should be pretty trivial.


You deal with this in a Rust kernel in exactly the same way you deal with it in a C kernel: by using different APIs for allocation.

Linux doesn't use malloc; a Rust kernel need not use Box::new either.


Try harder: https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html#t...

You can implement your own global allocator, use an existing crate, or (in the case of user applications) use the system's. That is for the stdlib level, not even core...


Nope, the GP is still correct. If your custom allocator fails, anything created with Box::new or similar will panic.


I see the author and I share a naming convention for our miscellaneous scripts, and `do.sh` appears prominently in both our work.

I subsequently branched out to e.g. `do-backups.sh`, where despite the `do-` being superfluous, I quite like the aesthetic.

Slightly more on topic: I really enjoy blog series like these, with plenty of detail on esoteric topics I really have no idea about, or on the face of it much interest in. They're a fun way to increase the breadth of topics I have a superficial knowledge of.


For someone with a standard CS undergraduate education, how much effort (in hrs/week) would such an endeavour require?


Old wives tail from ancient times, before CompuServe:

The first 20% of the effort gets you about 80% of the results. So everything seems exciting. But to that that last 20% of the results requires 80% of the total work.

There is a ton of un-fun, un-glamorous work getting a gazillion device drivers written, for example.


You can always use a hypervisor, a driver OS that re-uses Windows/Linux drivers, and the new OS in a VM using virtual drivers. That's what some L4-based setups, including commercial OKL4, did. They also let you write native drivers directly on the microkernel or in OS VM's for situations where effort was justified. I'm surprised more haven't done this.

Rump kernels are the closest trend.


> I'm surprised more haven't done this.

Because this is generally not secure. The driver OS will have access to hardware that can bypass the memory restrictions set upon it by the microkernel.

There is sometimes special hardware to address this but they are too complex to manage from the kernel.


"Because this is generally not secure."

Most OS's aren't designed to be that secure, though. That's why I wonder why it hasn't been tried more for usability. A security-focused setup certainly has more to be concerned about. Like I advocated with Xen, a good start would be making the host OpenBSD. They should be able to get hardware that would be compatible.


Since there are missing parts: Do you have a sheduling plan for the remaining posts?


One per week.


Amazing tutorial, and really exciting for RISC-V. I hope he creates a patreon or some other donation platform, as it must take some time to write this up. I'm sure some people would be willing to send a bit his way.


Done! Thanks for the suggestion. The Patreon link is at the top of the blog.


One suggestion: Your patreon page only mentions the Aarch64 project, not the Rust on RISC-V project. That might alienate people coming from the Rust blog; they might think they landed in the wrong spot (I sure did)


Making X.....but using Rust!!!


I really really like the first sentence

> RISC-V ("risk five") and the Rust programming language both start with an R, so naturally they fit together


I expect the os to be fully based around and r7rs loaded racket erlang and R.


When RISC-V inevitably renames itself CISC-V, then what language would naturally fit together?

And don't say cobol.



COMTRAN would be a way better choice than COBOL :-) (https://en.wikipedia.org/wiki/COMTRAN)

Common Lisp would be a decent choice, too, with

    CISC:RISC = Common Lisp:Scheme


C++?


Obviously: Clojure.


Why nor CRISC-V? Or CaRISC-V? Or (eww) RaCISC-V?

It's not like current "CISC" architectures haven't taken a lot of the good aspects of RISC already too...


Well it would be "less risk", maybe even "riskless" so I guess you'd need a new language named "restless"?


V

https://vlang.io/

You could vlog about it.


Cython with a rust core?


C

Worse is better, forever.


"Because I wanted to" should be reason enough. But for those who demand a better justification, this is very specific ;-)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: