Hacker News new | past | comments | ask | show | jobs | submit login
ldd arbitrary code execution (2009) (catonmat.net)
217 points by xk3 on July 23, 2023 | hide | past | favorite | 85 comments



This not new info, but it's good to tell others.

My "Program Librarues HOWTO" says this:

"Beware: do not run ldd on a program you don't trust. As is clearly stated in the ldd(1) manual, ldd works by (in certain cases) by setting a special environment variable (for ELF objects, LD_TRACE_LOADED_OBJECTS) and then executing the program. It may be possible for an untrusted program to force the ldd user to run arbitrary code (instead of simply showing the ldd information). So, for safety's sake, don't use ldd on programs you don't trust to execute."

https://dwheeler.com/program-library/Program-Library-HOWTO/x...

I believe that doc dates from 2000. This info wasn't new then either, it was specifically documented in its man page.


It's not general knowledge among people who only sometimes drop down to root (such as me). Although it is mentioned in an early paragraph of the man page, it's easy to miss. It could use an attention-drawing "IMPORTANT SECURITY NOTICE" there.


That's the wrong way to solve this. You have to assume people don't read the manual, especially for security issues.

The correct solution is that `ldd` should be safe by default, and require a `--allow-unsafe-execution` flag if it needs to actually execute code.

I think this is pretty well known these days but probably not well known enough!


Too backwards incompatible maybe, although you could add a check on whether the output is a tty, like ls does; I don't know how often ldd is used in scripts.


Can’t resist mentioned the funny, but perhaps less than adequately prominent note to “beware of gift horses” on the Plan 9 manual page for bundle(1) [pack files into a self-extracting shell script, more or less equivalent to shar(1) on a normal Unix].

https://9p.io/magic/man2html/1/bundle


None of the previous discussions on HN mentioned using readelf instead of ldd.

Whenever I want to quickly check for dynamically linked libraries in a program, I use

   readelf -d program
The use case I have is that I want to know what are the dependencies the program needs from the host system.

If I wanted to know what are the dependencies of a particular shared library I can use

   readelf -d library


Note lddtree from pax-utils requires python and the elftools module, even though one can use readelf and objdump instead of python. https://raw.githubusercontent.com/ncopa/lddtree/main/lddtree...

If I wanted a recursive listing I would use libtree, a 56k .c file.

https://raw.githubusercontent.com/haampie/libtree/master/lib...


(author of libtree): I believe that downloading, compiling and running libtree is faster than the startup overhead of pax-utils lddtree ;) maybe C is the better scripting language?

    $ alias latest_libtree='curl -Lfs https://raw.githubusercontent.com/haampie/libtree/master/libtree.c | cc -o /tmp/libtree -x c - -std=c99 -D_FILE_OFFSET_BITS=64; /tmp/libtree'

    $ time latest_libtree /usr/bin/vim

    real 0m0.179s
    user 0m0.119s
    sys 0m0.019s
    /usr/bin/vim 
    ├── libpython3.11.so.1.0 [ld.so.conf]
    │   ├── libexpat.so.1 [ld.so.conf]
    │   └── libz.so.1 [ld.so.conf]
    ├── libgpm.so.2 [ld.so.conf]
    ├── libacl.so.1 [ld.so.conf]
    ├── libsodium.so.23 [ld.so.conf]
    ├── libselinux.so.1 [ld.so.conf]
    │   └── libpcre2-8.so.0 [ld.so.conf]
    └── libtinfo.so.6 [ld.so.conf]


That's great, but unlike ldd, readelf won't walk the list of dependencies recursively.


never fear, recursive and concat unix piping to the rescue


At that point, why not just use ldd?


Because it executes the binary


If you're worried (like really worried) about a binary, then it can easily fool readelf (btw, it can fool ldd too), so eg the data from the ELF header is never used.

Probably running `strace` or `perf record -e intel_pt` in a confined environ is the best option to see what a binary really does, short of using ida/disasm/gdb - but then a binary can discover the use of strace too, and behave differently :).

It's turtles all the way down.


How exactly it would fool readelf, which works on entirely different principles?


Eg. by changing PT_INTERP to something else than usual (or dispensing with it, so the binary becomes a static one, still displaying DT_NEEDED), or exploiting bugs in ld.so, so the values shown by readelf become meaningless.

I'm also not entirely sure that readelf/binutils was reviewed from sec-perspective, it might not be out of the question that readelf might show things which are then interpreted differently by either kernel (loading a basic static bin or ld.so/interpreter) or by ld.so itself.


One particular case when a program won't be handled by ld-linux.so is when it has a different loader than the system's default specified in it's .interp ELF section. That's the whole idea in executing arbitrary code with ldd -- load the executable via a different loader that does not handle LD_TRACE_LOADED_OBJECTS environment variable but instead executes the program.

The way dynamic linking is done on UNIX-like systems has always seemed odd and a bit of an afterthought to me, and that explains why --- the actual loader in the OS kernel itself knows nearly nothing about dynamic linking. Instead it "redirects" execution to a loader (which is itself not that different from any other static binary) to do that.


I don’t know, I don’t hate the ELF interpreter bit; I actually don’t see why we need to have an ELF parser in the kernel at all—or rather I do, but the answer is likely “setuid”, which, bleh.

The emulating static linking semantics badly bit, though, I do hate, because it just seems so unnecessary and self-inflicted. The entirety of the symbol versioning mess could be reduced to build-time symbol aliases if only we agreed to bind symbols to imported dynamic libraries at build time and not at runtime. (MacOS’s “two-level namespaces” are a hack that tries to retrofit the sane behaviour on top of that, but of course it can’t free you from the underlying mess completely.)

And if we are comparing Unix and its descendants with Windows, isn’t it the case that dynamic linking is, in fact, an afterthought? Windows has had dynamic linking since literally 1.0, because without that you simply can’t fit multiple graphical apps into memory on the kind of machine it was supposed to target. I’d say that early 16-bit Windows essentially is a fancy dynamic linker tied to an overlay manager, plus a cooperative actor/object-like thing, and only then some graphics and input routines on top of all that. At least as far as the application programmer’s mental model is concerned.

Whereas Unix got dynamic linking well into the workstation era, and I remember somebody saying it was an attempt to reduce the memory overhead of Xlib; the classic Unix answer to runtime extensibility would be to spawn a separate process.


> The way dynamic linking is done on UNIX-like systems has always seemed odd and a bit of an afterthought to me, and that explains why --- the actual loader in the OS kernel itself knows nearly nothing about dynamic linking. Instead it "redirects" execution to a loader (which is itself not that different from any other static binary) to do that.

This makes it much easier to evolve the system because kernel-land programming is harder than user-land programming. The trade-off is that the user-land run-time loader has its own bootstrapping complexity to deal with.


On Windows, the loader for usermode runs in usermode too; but every executable, by design, doesn't get to choose what to use for its loader.


> but every executable, by design, doesn't get to choose what to use for its loader.

Sure it does: it can load whatever it wants however it wants once main() gets to run.


Well dynamic linking and most things in unix is an afterthought by definition because the first unix system did not have them.

That doesn't seem to be what you mean though in context, but I'm not sure what you do mean. You don't like it or don't think it was designed well because dynamic linking is done in userspace?


The fact that an executable needs to explicitly specify the "interpreter", instead of that being something under control of the OS loader, doesn't seem like a good way of doing it; and then the implementation of ldd discussed here just takes it in an even weirder direction.


> The fact that an executable needs to explicitly specify the "interpreter", instead of that being something under control of the OS loader, doesn't seem like a good way of doing it;

The OS loader does control the permissions of what can be read and executed. And the user can execute a file. Why would it be better if you had to execute the loader explicitly to run a program? The same security problem would apply if you convinced somebody else to execute your malicious code. Seems like pretty harmless syntactic convenience.

The issue of ldd executing the program I guess is a thing that could be tightened to avoid foot shooting. Lots of unix programs traditionally were happy to give you lots of rope though.


What distinction are you making? Do you think it should be part of the kernel? I ask because ld.so (the runtime linker/loader) is part of the OS on many systems.


I mean not explicitly hardcoding the path to the interpreter inside every binary. If the dynamic linker is part of the OS, then surely the OS should already know where it is?


> If the dynamic linker is part of the OS, then surely the OS should already know where it is?

It's both part of the OS and the language's compiler tool chain the program was written in. For the vast majority of Linux executables that's glibc and the ld-linux.so that glibc ships.

And you don't need to hard code the path inside your executable. Static PIE objects for example are dynamic elf objects without an interpreter header. The system loader is capable of launching them just fine.


I’m still not sure what principle you’ve got that’s being violated here? Given that one has chosen to implement part of the runtime linker/loader in userland, what’s wrong with fitting that into a more generic facility (supporting interpreters at arbitrary paths) and putting this specific implementation into a committed path?


How much do you actually know about this? Have you read the ELF spec or written anything with libelf, hacked on a loader, a linker? Considered the actual problems with having this hard coded vs alternatives?

I ask because ELF and unix dynamic linking is a pretty well thought out set of specifications and processes and it's a huge body of work. It can certainly be criticized, but calling it an afterthought because of .interp doesn't seem very charitable, just trying to understand if that's an informed opinion and if so it would be interesting to hear more.


It fits perfectly in the Unix philosophy.


Sucky working code is better than code that doesn't exist, yes.

But isn't it silly that every Linux system needs to place this exact file at this exact path or else no program for another system will possibly be able to run on it?


The name of the file is just part of the committed interface that the binary consumes. You could just as well suggest it's silly that every Linux system needs to agree on the behaviour of each system call number, or no program for another system will possibly be able to run on it.


"But isn't it silly that every Linux system needs a kernel and an initrd image or else it won't boot?"


Translation, worse is better philosophy.


Not really. Would you rather that the dynamic linker-loader be part of the kernel? Do you think that the kernel can somehow prohibit user-land dynamic linking?


I would rather have it as Windows and AIX do, or even mainframes, or by gone systems of similar age like Xerox PARC or ETHZ ones, yes they are in the kernel.

In any case, modern security considerations have already shown that dynamic loading of foreign code into process memory isn't that much of a good idea after all, unless there are hardware restrictions regarding performace and memory use.


Well, I don't want dynamic linking and loading of user code to be done in the kernel -- that only increases the attack surface on the kernel.

> In any case, modern security considerations have already shown that dynamic loading of foreign code into process memory isn't that much of a good idea after all, [...]

As with many things it all depends on best practices and how you use the technology. Dynamic linking can be a big performance win. It can also be a performance loss. Dynamic linking can also be a big win in terms of code size, which even today is a good thing to have because container images can get bloated and you can need to have lots and lots of them, and it all adds up. None of that means you should be willing to dlopen() untrusted code, say.

And again, you can always implement your own dynamic linking and loading regardless of what such facilities (if any) the host OS provides.


9front does it better. No dynamic linking. At all.


User-land code can always add dynamic linking.


(the various implementations of) lddtree also don't have this "feature" as they all use readelf or similar (EDIT: and unlike readelf -d, will recurse through dependencies like ldd does)

I considered that rather well known though...


I just learned today! Thanks.


Article is dated 2009.

Previous discussions:

- [2009] https://news.ycombinator.com/item?id=902958

- [2015] https://news.ycombinator.com/item?id=9629667

- [2022] https://news.ycombinator.com/item?id=30033807

Also formatting inside code-block is broken


Is the mentioned issue in the article still true today?


Yes. So is the (IMHO) best workaround of using lddtree (pax-utils) instead. Available on pretty much all distros, though probably not installed by default.


objdump -x /path/to/bin | grep NEEDED


Doesn't tell you if/where the library is found on your system.


for i in $(objdump -p /path/to/bin | grep NEEDED | awk '{print $2}'); do ldconfig -p | awk -v var="$i" '$1==var{print $0}'; done

edit:

even simpler,

for i in $(objdump -p /path/to/bin | grep NEEDED | awk '{print $2}'); do ldconfig -p | grep $i; done


Cool, but I think I'll stick to lddtree ;)

(… it also recurses down on the dependencies, presumably you can extend your shell script to do that too but at some point why not just use the existing tools?)


Not in glibc at least[1], and I reckon many other implementations don't have this problem either.

[1]: https://manpages.debian.org/bookworm/manpages/ldd.1.en.html#...


Am I the only one who read this and thought 'wow, if I can get LD_TRACE_LOADED_OBJECTS to be set I can block anything from being executed'?

That seems like a scary powerful environment variable.


You can also set LD_PRELOAD to the path of a shared library whose constructor segfaults, or set LD_LIBRARY_PATH to the path of a directory containing a libc that doesn't work.


Not really. Realistically you can set PATH to an empty string, and block execution everything that isn't invoked using a hardcoded path.


Yeah, I thought about that, but this would kill even directly invoked executables, right?


Yes, this is certainly more "far-reaching" than other env vars. But If you have the ability to wreak havoc even if ld-linux wouldn't check for this (obv. PATH being the prime example, but unsetting HOME will almost certainly break some software written by novices (not falling back to the home specified in /etc/passwd)).

And I also could imagine that tools that allow setting arbitrary env vars might filter common troublemakers like PATH, but almost certainly not niche env vars that change the bahviour of ld-linux.


I believe I saw a simple shell script based on nm or binutils which could do a safe ldd on any executable, including executables for different platforms.

Update: Here seems to be one example: https://gist.github.com/jerome-pouiller/c403786c1394f53f44a3...


It's readelf, but it will only list libs which are declared in the ELF of the executable. It won't list libraries loaded at startup with dlopen(3), unlike ldd. I'm also not sure, if this script will recurse, but I suspect not. ldd will print libraries loaded recursively.


> It won't list libraries loaded at startup with dlopen(3)

Well, that'd require actually executing the startup code.


This seems to be based on readelf.


Another fun `ldd` fact: some executables can be run, even when ldd says its libraries cannot be located.

This is thanks to $ORIGIN rpaths.

1. Create an executable in `./bin/exe` with say `$ORIGIN/../lib` rpath, and link it to some library `./lib/libfoo.so`

2. Create a symlink one dir up: `ln -s ./bin/exe exe`

3. `ldd exe` will tell you `libfoo.so` is not found

4. `./exe` runs just fine.

This is because the kernel resolves the symlink path, and the dynamic loader receives that, and uses it for $ORIGIN.

`ldd` on the other hand will interpolate $ORIGIN with the symlink location, not its target.

This only "works" for the executable, not for dependencies of dependencies: if a library is located as a symlink in a different directory from its target, $ORIGIN is relative to the symlink, and it's likely to break.

This can be problematic if you wanna put symlinks in say `/usr/local/lib`.


recommended workaround: use lddtree (part of pax-utils)

The output is actually more informative too:

  $ lddtree `which ssh`
  /usr/bin/ssh (interpreter => /lib64/ld-linux-x86-64.so.2)
      libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1
          libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0
      libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2
          libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3
              libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1
              libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2
          libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3
          libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2
          libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0
      libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3
      libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1
      libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
vs.

  $ ldd `which ssh`    
          linux-vdso.so.1 (0x00007fff7972e000)
          libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007f1a0030c000)
          libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007f1a002ba000)
          libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007f19ffe00000)
          libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f1a0029b000)
          libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f19ffc1e000)
          libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f19ffb84000)
          /lib64/ld-linux-x86-64.so.2 (0x00007f1a00499000)
          libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007f19ffaaa000)
          libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007f19ffa7d000)
          libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007f1a00293000)
          libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007f1a00285000)
          libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007f19ffa76000)
          libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f19ffa65000)

(lddtree opens the files as plain binaries and reads the ELF linking data, it does not execute any code from the files being inspected.)


These days I tend to use $() over backticks, anyone wanna fight me on it?


For me, it's `{} Directly giving `cmd for free.


I get that it's important that we call out exploitable situations like this, but it's also kind of moot. There are so many ways to exploit modern systems you can just rustle around in a bag of exploits and one will work. Known executable, unknown, doesn't matter. Not even using a VM will keep you safe. Not even RBAC.

What's easier and more reliable is to work in terms of risks. It's fine if you get exploited, as long as the access you have to sensitive systems is limited, and those sensitive systems have backups, and you can't delete those backups, you can re-deploy systems from scratch using automation if you get compromised, automatically rotate credentials, etc.

Lower the overall risk by setting everything up so the worst case scenario isn't that bad. Then you don't have to worry so much about an "unknown executable" because even if it gets exploited the attacker can't cause too much damage.


This ("so many ways to exploit modern systems") is not actually true.

Yes, plenty of memory-unsafety vulnerabilities exist, but modern mitigations like stack cookies, ASLR, (and sometimes) sandboxing and PAC make it unlikely that e.g. a buffer overflow is exploitable without other factors such as an information leak from your machine back to the attacker. (This might be the case on publicly-accessible servers, but probably not on your laptop.)

The vulnerability being discussed here is unusually dangerous because it's more like command injection, and mitigations aren't going to help.


ASLR, PAC, etc are trivial to defeat for an experienced black hat / red team.

I'm not even talking about memory safety. That's just one class of exploits. There are so many more to choose from.

This vuln isn't that dangerous. It requires a special circumstance and trust. Other vulns don't require those things.


The number one method of security, at least for power users, is user behavior. Look at URLs you might visit critically. Don't run random-ass code. Keep an eye out for being taken advantage of.


> Not even using a VM will keep you safe.

Do you browse the Internet? Ever visited an unknown website? Whatever the browser uses to run JS and wasm code on the web ought to be enough.


Browsers do in fact get exploited; I would have called that an example of it not being a solved problem.


They get exploited (and then the exploit is patched), but we're still using a browser to talk about this, right?


People used IE6; I wouldn't call that an endorsement of it as a secure system, only useful enough to justify the risk.


You're joking, right?

Browsers haven't ever been safe. There are competitions every year to find new 0days that break out of browser protections, and every year multiple are found. And those are the vulns they'll tell you about.


Maybe combining the linker and the loader into a single program is a bad idea. I guess they kinda do similar job, if you look from the kernel point of view...


besides being slower, splitting them into two would mean they’d need to agree on abi and evolution. Having them ship in the same binary skips over those concerns


Okay, I've just checked, and apparently the ld and ld.so are two completely different executables: the loader, ld.so, is 236K large and statically linked while ld is 1.7M large and dynamically linked. And now that I think of it, I believe that alternative linkers such as gold, or mold, or lld, don't come with alternative loaders so... they have to manually follow the GNU's ld.so ABI and evolution?


The terminology is a bit confusing: the ordinary linker used at compile time is gold, or mold, or lld, that outputs an ELF executable. Then, the dynamic linker, or dynamic loader, is provided by the libc and usually called ld-whatever.so. It takes the loaded ELF program as input when you start it, and processes relocations and whatnot before yielding control to the program's entry point.

The interface between the compile-time linker and the dynamic linker is mostly defined by the ELF file format, System V ABI, and processor-specific ABI: many of the details can be found in the elf(5) man page. So in principle, there isn't really a bespoke "ld.so ABI". However, in practice, this can break down a bit on the consumer's side; there was a ruckus a while back when glibc started only including the undocumented DT_GNU_HASH and not the documented DT_HASH for its shared libraries, under the premise that only libc should be concerned with interpreting ELF headers.

At least from the producer's side, the story is better, since dynamic linkers will generally at least accept the standard headers, and ignore any nonstandard headers they don't understand.



That’s incredible. Is there any reason not to fix ldd to just fail if the program is using a different loader?


If you take a look at the the ldd shell script, current versions no longer run the raw executable, always executing like the author's 3rd command example.


Different than what loader?


The GNU one.


Because you may want ldd to work when the program is using a different loader. Not all nonstandard loaders are malicious.

If you want to avoid this risk, use something else, like readelf or lddtree.


This attitude is why this issue exists in the first place.


ldd's implementation relies on the assumption that the loader respects the LD_TRACE_LOADED_OBJECTS environment variable. Do all non-malicious loaders need to respect this environment variable and implement GNU ld.so's behavior?


Wouldn't have thought of 'ldd', would have gone straight to objdump.


uh oh


Never run IDD without also running QD, which enables a state that prevents damage or death, aka. "God Mode".




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: