It's designed with embedded systems in mind, but has support for all kinds of other stuff, too. It also has some very advanced binary patching capabilities.
I've been using unp (https://manpages.ubuntu.com/manpages/focal/man1/unp.1.html), which is just a wrapper around standard cli tools for unpacking things (tar, xz, unzip, etc).
It seems pretty dated by now, good to see some replacements!
If you’re looking for a general-purpose extractor for known, common archive formats, bsdtar is really nice these days; it’s libarchive-based and does way more than just tarballs (extracts zip, rar, and 7z as well as all the common compression formats on top of tar plus a bunch of others).
Not really in the same class of tools as unblob, but handy to have around regardless.
apack/aunpack from the atool suite for me [0]. Funny how many solutions exist for this problem. Though I think Unblob is aiming more for binwalk's niche [1].
The atool suite is great but only supports well formatted files. The idea with unblob is to precisely identify valid chunks of data within arbitrary files, carve them out, then decompress/extrat/convert them. You would not believe how many embedded devices vendors custom formats is just a loose aggregation of almost standard archive and compression packed up in a single file :D
Yes, we developed a framework "unblob core" which can be extended easily [1] to any use-case. For example, a separate Python package could contain format specifications for "forensic analysis" or "game assets" (these two came to my mind, because we already got requests for them).
It's in Python and is able to deserialize Unity archives, treating them as a serialization format rather than a simple archive format. Feel free to email me if you want to integrate something like this or you have questions :)
The main difference is that binwalk goes through a file linearly, searching for patterns like magic bytes, and tries to extract everything it finds.
The problems with this:
- very noisy, finding a lot of false positives (license code, format inside another format, etc).
- very slow, trying to extract irrelevant things
- imprecise, because it finds patterns in the middle of a file, where it's actually not relevant on the first level of extraction
unblob solves these problems by being smarter about the file formats, recognizing them by their specification, for example unpacks format header structs and carves out files based the information in the header (size, offset).
See a simple example for NTFS [1].
We also went to great lengths preventing unnecessary work by skipping formats inside another [2].
We are using hyperscan [3] instead of grepping byte sequences with Python, which is orders of magnitudes faster. It can also handle 4Gb+ files because of this which binwalk cannot.
It's used for a year now in production and it's way more precise and faster than binwalk. We are getting less false-positives too, and even if unblob fails to extract everything, we still get meaningful information out of firmwares, where binwalk just failed with no output previously.
Indeed, it's a smarter alternative for binwalk, which we started because binwalk was not a good fit for us :) Should probably include comparison somewhere in the docs.
Adding a comparison, or at least the clear differentiators, to the documentation would be very helpful.
As someone who uses `binwalk` extensively in a professional setting, with tooling built around `binwalk`, it would be useful to see (a) how `unblob` would integrate and (b) if it could be a replacement or supplemental.
This looks awesome! Looking forward to trying it over binwalk!
I'd be great if you could get it building on aarch64 and non-Linux system, I've tried adding the flake to my nix dotfiles on Mac, but quickly realized you support only x86_64-linux for now.
Very cool - something I don't expect to need very often but likely critical when I do. I can't help but feel there was a missed opportunity to name it 'unpackman'
Does anyone know of something similar for text file formats? In particular something that makes it easy to work with legacy fixed width record file formats?
If you're talking about Intel HEX and Motorola S-Records, we developed unblob handlers for them. They're not public at the moment, but I can assure you it works.
I'm writing a cli photo/video manager and have all of these parsers written but png. Fun, but would be better to replace it all with a mature lib. I think I've seen a python lib that parses everything but can't remember the name.
No, sorry for being unclear, I mean if the archive contains a toplevel folder e.g. binutils-1.0.0, that I can rename it to binutils before extracting (so I don't end up with the binutils-1.0.0 on my filesystem), and preferably in such a way that I don't have to know the toplevel folder name before extracting.
There are multiple reasons: 1. it already supports more formats [1] than atool.
2. It's written in Python and easily extendable by anyone interested supporting their own formats. [2]
3. It's probably faster because of hyperscan.
4. More reasons :) [3]
If you're interested in something similar that can put things back together after you've modified them, check out OFRAK:
https://github.com/redballoonsecurity/ofrak
It's designed with embedded systems in mind, but has support for all kinds of other stuff, too. It also has some very advanced binary patching capabilities.
I work on it as part of my day job.