Nolibc: A minimal C-library replacement shipped with the kernel

userbinator · on Jan 23, 2023

In the first attempt, the preinit loader used a negative system-call return value to carry an error-return code. This was convenient, but doing so ruins portability because POSIX-compliant programs have to retrieve the error code from the global errno variable, and only expect system calls to return -1 on error. (I personally think that this original design is a mistake that complicates everything but it's not going to change and we have to adapt to it).

The -1 for "an error occurred" has always seemed like an oddity to me too, especially when the kernel itself already uses the negative range for distinguishing between different errors. It's especially problematic in the common use-case of cleaning up after a system call before determining whether an error occurred, because instead of simply doing this...

    ret = some_syscall(...);
    ...do some cleanup...
    if(ret < 0) { ...error case... }

...it also requires that you store errno if the cleanup code may change it (and if you always do that, then it's wasted in the non-error case.) I've seen more than one instance where the authors of an application/library decided to add another wrapper on top of the POSIX ones to essentially "undo" this stupidity by making their wrapped calls return a negatively-biased errno.

agapon · on Jan 23, 2023

Surely, POSIX / libc / notion of system calls predate what you consider to be "the kernel". There are many, and there have been even more, kernels and OS-s.

arcticbull · on Jan 23, 2023

indeed, having a magical thread-local errno value is an oddity as well.

GoblinSlayer · on Jan 23, 2023

getcwd and mmap become a bit funky that way.

rwmj · on Jan 22, 2023

He briefly mentions dietlibc ("not evolving anymore") and ulibc. I think he'd be better off contributing to those projects (or musl). You might start off thinking you only need system calls, but at some point you'll want to print something, and even a basic 'printf' will be very handy.

FWIW I have built a program that needs a tiny initramfs[1] and we've found that dietlibc and musl worked really well producing very tiny binaries. glibc is terrible - it links huge amounts of code into even the smallest program.

[1] https://github.com/libguestfs/supermin/blob/86fd6f3e86ab99d5...

synergy20 · on Jan 22, 2023

why dietlibc and uclibc instead of the modern musl?

jchw · on Jan 23, 2023

Agreed, I think musl is the obvious choice actually: it may be newer (at least I think it's newer) but it's pretty solid, and it's use in Alpine Linux has probably helped it mature and gain ecosystem support, especially as Alpine gained popularity for Docker/OCI images due to its size efficiency.

rwmj · on Jan 22, 2023

He doesn't mention musl in the article, but in my experience musl would also be an excellent choice.

wtarreau · on Jan 23, 2023

Hi!

article author here, I can respond to a few questions. I didn't know about musl by then and it could definitely have oriented my choices differently. However after I started to use macro-based syscalls, I found there were very convenient benefits in not having to compile a libc for some cases. Not only static functions are optimized away when not used, but in addition you can use any toolchain, you don't have to fiddle with wrappers for dietlibc/uclibc nor include them with your toolchain.

For regular sized projects I would encourage anyone to use musl of course. But for tiny stuff like in the kernel that relies on a bare-metal compiler and a handful of syscalls and very limited stdlib functions, nolibc is convenient.

MuffinFlavored · on Jan 22, 2023

https://www.kernel.org/doc/html/latest/RCU/whatisRCU.html

Had never heard of "rcutorture" before

or dracut

https://en.wikipedia.org/wiki/Dracut_(software)

kitd · on Jan 22, 2023

I'm open to correction but I believe this is used in the Cosmopolitan library.

eesmith · on Jan 22, 2023

Seems unlikely. My spot check of the the two vfprintf implementations shows no flow from one to the other, and shows that part of the Cosmopolitan code has an older lineage than nolibc.

The nolibc source has many reference to copyright held by "Willy Tarreau", under LGPL-2.1 OR MIT license, with a copyright date starting in 2017.

The string "Tarreau" does not exist in the Cosmopolitan library, so that's a strong negative there. Let's look closer.

The file organization is quite different. And so is the implementation. So that's another negative.

Compare the vfprintf in nolibc at https://elixir.bootlin.com/linux/v6.2-rc4/source/tools/inclu... (a 'minimal vfprintf()') with the one in cosmopolitan starting at https://github.com/jart/cosmopolitan/blob/master/libc/stdio/....

Right away we can see nolibc places many functions in the same file while Cosmopolitan uses a one-function-per-filename organization.

Cosmopolitan's fvprintf locks the file (which nolibc doesn't need to do) then calls vfprintf_unlocked which calls __fmt at https://github.com/jart/cosmopolitan/blob/master/libc/fmt/fm... , which is the actual implementation. It look very different from NOLIBC's.

Okay, so perhaps that's they way now but not at the beginning?

We can also go back to Cosmopolitan's original implementation and see how vfprintf goes through https://github.com/jart/cosmopolitan/blob/c91b3c50068224929c... to call "palandprintf", which https://github.com/jart/cosmopolitan/blob/c91b3c50068224929c... says is copyright "Marco Paland" from 2014-2019.

That's a few years older than the start of nolibc, available from https://github.com/mpaland/printf , and part of https://github.com/embeddedartistry/libc , a "libc targeted for embedded systems usage".

Thus, multiple factors seem to agree that nolibc code is not used in the Cosmopolitan library.

wtarreau · on Jan 23, 2023

I can confirm that the code looks totally different. Anyway there's no reason someone writing a libc would waste their time with pieces of code from other provenance. It takes less time to write a minimalist printf than trying to adapt an existing one to your exact needs, types, validity domains etc, and it'll be easier to extend yours that someone else's. You can write it however you want, it will always end up with a loop around a switch/case :-)