I remember how back in MS DOS days polymorphic viruses first appeared, in an attempt to avoid detection by antivirus software (useful and essential back then).
Now the tables have turned, and legitimate software has to become somehow polymorphic to thwart attacks by malware.
Yes, the base idea is not that new. I store since years every GO based application I use as small (few kb) source code tree checkout only, no binary at all. At runtime the wrapper compiles a randomized individual one-time-temporary-uniq binary via garble [0].
The compiler (go) is part of a static read-only (compressed/in-memory) RootFS. Build on a air-gap build server, touching only signed/verified/reviewed code from git-offline mirror snaps. Go has no libs, all static. The resulting runtime only binaries are totally uniq/randomized and dependency free, straight from (signed) source code.
Does anyone know an actually-happened example case where a fine-grained ASLR (like the OpenBSD relink one) successfully mitigates or significantly hinders an exploit, and the usual ASLR doesn't?
I'm curious because years ago the academic strongly pushes the FG ASLR story, then OpenBSD did kernel relinking, but I haven't heard any industry story on how effective this is.
How would you measure that? How would you notice a failed attempt at finding a hole, of stack smashing, etc, if it did not even result in a crash, or any other dysfunction?
I suppose if someone found a technique in the wild that defeats regular ASLR (even if only sometimes), they could then test that same technique against fine-grained ASLR and evaluate if the FG ASLR was more effective at preventing exploitation.
Any classic buffer overflow/stackssmash can defeat ASLR, it just might take a long time to get lucky guessing addresses. Couldn’t we Monte Carlo this?
Maybe take a known vulnerable exec, create a fuzzing attacker and run it both ways seeing how long it takes to get lucky a few times. The more secure version should take longer.
There are whitehats constantly looking at this sort of thing. It's why we can say that KASLR is a huge waste of time - because it's both theoretically bogus and we have actual exploit devs saying that they don't care about it.
My guess is you look for a security POC that reads something like "We were unable to build the ROP chain on openbsd due to the gadget locations being unknowable."
The traditional antidote to ASLR is "known code reuse", tricks like ret2libc, ret2plt, etc.
FG-ASLR helps because, even when you know where .text is, there are now N possible randomized locations for the piece of code your exploit leverages, so if you pick one and exploit M machines that way, only M/N of the exploits will succeed (where you got lucky).
Ultimately it is obfuscation, but with enough entropy it is very effective. It can't mitigate or prevent an exploit, but it makes it more work to turn an exploit into code execution consistently enough for it to be useful.
In order to enable relinking, they had to keep around all original .o and .a files, as well as a Makefile. Not a problem for OpenBSD but pretty unusual for binary Linux distros.
I wonder if it is possible to make a relinker which only requires binary output -- so it could be easily incorporated into existing systems.
One way I can think of is to keep relocation/original object information in the debug sections, so that one can reconstruct original object files and re-link them. But I am guessing this will not work with LTO though... Or maybe we can just make a bunch of debug sections and store input object/library files verbatim -- this will at least double the binary size, but will allow for easier relinking.
Please don't throw out the baby with the bathwater.
Fully reproducible builds provide great assurance against the supply chain attacks. But 100% reproducibility is in some cases a bit too much. What matters is whether the artifact can be easily proven to be functionally identical to the canonical one.
So I am 100% for a fully predictable sshd random-relink kit, producing unpredictable sshd binaries, but only as long as there is an instruction how to check that the sshd binary that allegedly came from it indeed could have come from it, and was not quietly replaced by some malicious entity.
> So I am 100% for a fully predictable sshd random-relink kit, producing unpredictable sshd binaries, but only as long as there is an instruction how to check that the sshd binary that allegedly came from it indeed could have come from it, and was not quietly replaced by some malicious entity.
You can easily verify the integrity of the object files that are used in the random relinking - they are included in the binary distribution, and are necessary to perform the relinking.
The debate of static vs dynamic linking is still going on, and a very strong argument against static linking has always been that upgrading vulnerable libraries is made difficult. But think of it: package managers already hold the meta-data of what links to what; object files can be distributed just as easily as shared objects; the last necessary step is to move the actual linking step from the kernel to the package manager.
On OpenBSD all static system binaries are compiled as static PIE, so they already benefit from ASLR. The issue, IIUC, is that ASLR only randomizes entire ELF sections relative to each other. In any executable or library, whether static or dynamic, the code is placed into one giant .text section, so the relative offsets remain static. In a dynamic executable all library dependencies are loaded separately, so at least each section of each library gets a unique base address. A leak that exposes the address of a function only leaks the address of other functions in that library, not every function in the process. But in a static executable all those libraries are also placed into the same .text section as the main program code, so a leak of any function address leaks all function addresses.
In theory all functions, or more realistically groups of functions spanning page-size increments, could be dynamically located. The obvious way to achieve that would be to have multiple .text sections within a main executable or library. But off-hand I don't know if that's actually supported by ELF, or if so whether the standard tool chains and environments could easily support it.
The ELF spec certainly allows for multiple .text sections, and one can also use totally custom sections with the correct attribute too.
Any linker that could not handle multiple identically named sections is simply buggy. That said, it is normal for a a linker to prefer to output only a single section of each name, but it is not difficult to get a linker to output multiple .text equivalent sections, especially if you make them have distinct names.
However section are not really what you want. PT_LOAD segments are, since those represent regions that get memmap'd contiguously. One can certainly put different executable sections into different segments.
I'm not 100% certain about how it works on OpenBSD, but on Linux, neither the kernel loader nor the loader embedded in the dynamic linker randomize the segments independently. The problem is that for dynamically linked code, the .text needs to be able to reference the GOT and PLT via relative addresses, so those segments must be loaded at a known distance relative to the code.
For simple static PIE executables this should not be needed [1], however if you start introduce multiple chunks of code loaded at random addresses again, then you need to reintroduce similar concepts, as you cannot reference code in those other randomly placed chunks with a relative address.
Assuming things are at all similar in OpenBSD, to do what you are proposing, it would be needed to mark groups of segments that need to be loaded relative to each other, allowing other segment groups to be randomized with respect to each other. For code in one group to access globals or functions from the other, the linker would generate a GOT and PLT per group, similar to how dynamic linking works, but with simplifications since you know all the code that will be present, so don't need to worry about interposing, etc. In theory each GOT could get away with having as few as one entry per other segment group. [2]
Of course you would need code to initialize these GOT values. Realistically the static ELF loader would need to be augmented to provide the program with information about where it placed each segment group. Then the static PIE libc could include code that reads these offsets, and uses them to initialize the GOTs. If using the one entry per segment group approach and you place the GOTs a say the very start of each segment group, with the entries in segment group order, this would make for really simple initialization code. Of course, a more complicated relocation engine like a hyper stripped down dynamic linker would also be possible.
Footnotes:
[1]: Apparently on Linux even static PIE executables those have some amount of runtime relocation code that is needed (I'm not really sure what/why).
[2]: This is because the linker would know exact offsets of functions and variables within each segment group, so the code can simply load the other segment's pointer into a register using a relative addressing, and do the load/store/jump with that register plus the already known displacement into that segment group.
> You can easily verify the integrity of the object files that are used in the random relinking - they are included in the binary distribution, and are necessary to perform the relinking.
I don't understand the full logic here. Yes I can authenticate the object files. But how would you discern, after a possible intrusion, an "sshd" binary that is indeed a random combination of these objects, from a trojaned "sshd" binary?
Limiting the scope of the damage that root can cause is an open problem, orthogonal to verifiable builds. OpenBSD has some basic checks in place (securelevel), but you should still assume that a compromised host is, well, compromised.
The weak link in reproducibility is that you currently have no trivial way of recreating the same random order of the linked object files.
Currently the random relinking is implemented literally through a call to "| sort -R" (-R for random order) on the list of object files, passed as arguments to the linker. I suppose if sort -R took a seed argument that was saved somewhere safe (chmod 400), the linking order can still be reproduced, and the resulting executable checksummed against the state of the system.
Actually, saving the hashes of the objects into the executable itself, into a new section, would be enough. Then one would need to locate this section, confirm that the hashes there form a permutation of the canonical ones, relink the canonical objects in the same order, and check whether the resulting executable is the same byte-for-byte.
If you save the link order, then you’ve provided a map to the stacker of the link order used which defeats the whole point of randomization. No? I must be missing something
The section with the link order stays only on disk, i.e. not loaded into RAM, and is therefore useless to the one who tries to exploit sshd. Especially because the sshd binary is readable only by root now.
I guess the holy grail would be to combine this with hot patching (https://en.wikipedia.org/wiki/Patch_(computing)#HOT-PATCHING), and relink the kernel every now and then while it is running (currently, a system under attack would have to be rebooted every now and then, and that’s undesirable). That would face ‘a few’ technical hurdles, though.
Yeah I was just thinking this; I've got like years of uptime on my OpenBSD server--don't know how much boot time relinking is helping me. But for like, desktops and laptops, it's fine and a great feature IMO (you probably wade through a lot more muck on a personal machine)
If you have years of uptime on an openbsd machine you are not keeping it up to date.
I have to admit I am guilty of this as well, but any mantained openbsd setup should have an uptime of no more than 6 months and a well maintained openbsd setup will be shorter than that as security patches are applied.
Having said that one of the things I like about openbsd is that if you want to go dark and have an ultra stable system(no updates ever) all the pieces are there for you, (you will want to have the source, I would also make sure I have the ports tree for that release and a copy of the ports dist files.)
This is true; my VPS has some kind of problem updating a FDE machine and I've procrastinated doing something about it for years. The answer is probably putting everything on tarsnap and reinstalling.
What impact will this have on anti-tampering software that looks for changes in executable checksums? Tripwire and OSSEC come to mind and both can report their findings to a centralized server. Do package manager integrity tests still work? I assume anyone here using BSD in a PCI environment have already figured something out. Some people also feed checksums into Splunk.
Very good point. As per Job snijder's own words [0], "the sshd binary becomes unique on every openbsd machine". So checksum-checking systems will indeed trip everytime.
Ah, so that will have some features of ASLR missing. Specifically, you can't do this on a read only root and it didn't randomise the stack location as far as I can tell?
I think I've got a better idea now. So openbsd has ASLR which affects data, code/library, and stack positions. Then this solution works on top of it by reordering symbols within the code.
One thing I'm still not sure about is whether the kernel could theoretically do the same reordering at load time using relocatable symbols.
The assembler laid out the code within the sections and generally it's not changed after that (except for targets that do linker relaxation). However with -ffunction-sections the compiler would put each function in its own section which then can be independently relocated.
If each function is in its own section, then all function calls would need to be indirected through the PLT/GOT, even function calls within the same translation unit? Ouch.
No - the linker is there to resolve references among sections and can do so without PLT/GOT indirection when creating things like static archives.
There may be a code size cost in some architectures - that since the call destination can be relocated far from the call site that the assembler will need to make sure it allocates enough space to reach the call target instead of a small PCREL relocation.
The kernel needs a bit more information than that, since chunks of code can refer to each other and if you rearrange them this would break these since they're typically emitted as relative offsets.
Dynamic re-linking is cool, but it can result in less-than-optimal executables.
Sometimes it can be beneficial to optimize the link so most of the main thread stays in cache. Obviously this only really matters for CPU-intensive programs.
If a squint hard then this is a custom dynamic loader for .o files with rudimentary ASLR (where all your entropy comes from the permutation of the .o files), that happens to cache to disk.
Now the tables have turned, and legitimate software has to become somehow polymorphic to thwart attacks by malware.