Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What's sort of interesting is that there aren't too many overwhelming reasons why someone couldn't come up with a piece of software that autodetected a binary format and translated it to something readable in a GUI.

I mean we know what the binary layout is of things, so I never understood (outside of the time that it would take to build such a utility) why I've never been able to find something that says, "Oh yeah, that binary string contains three doubles, separated by a NULL character, with an int, followed by a UTF-8 compatible string."

Such a tool would be incredibly useful for reverse engineering proprietary formats, and yet I don't know of a good one, so if it exists it's at least obscure enough for it to have escaped my knowledge for well over a decade.



There is a command-line program called "file" that attempts to determine the file type (format). It uses a series of known formats and returns the first matching one. I have found it useful to reverse engineer proprietary formats.


Yeah, but that's for known formats.

If I said I have a buffer of 512 bytes and piped it through to some cli, that would be fine if it could tell me how many ints, chars, floats, doubles, compressed bits of data, CRC32s, UTF-8 strings, etc. it contained, but there's few utilities out there that will do that.


I'm curious how you'd propose doing that.

If I give you a buffer of 5 bytes:

[0x68 0x65 0x6c 0x6c 0x6f]

there are a ton of ways to interpret that.

    - The ascii string "hello"
    - 5 single-byte integers
    - 2 two-byte integers and 0x6c as a delimiter
    - 1 four-byte integer and ending in the char "o"
    - 1 32-bit float, and one single-byte integer
etc. Or are you hoping for something that will provide you with all the possible combinations? That would produce pages of output for any decently-sized binary blob.


I'm sort of looking for something that will attempt to narrow down possibilities. The way I'd do it is by providing some visualizations based on the user selecting what data types and lengths they're looking for.

So for instance, if I know I'm looking at triangle data, I can guess that it's probably compressed, ask the app to decompress the data based on some common compression types, look at that data and guess that I'm looking at some floats or doubles.

Maybe I'm wrong, so then I can ask the app to search for other data types at that point.

To me, that would be a tremendous help over my experience with existing hex editors.

Edit: It's not fair for me to say there aren't tools that do exactly this, but to be more precise, a decent user experience is lacking in most cases.


Your post reminded me of the presentation on cantor.dust:

  https://sites.google.com/site/xxcantorxdustxx/

  https://www.youtube.com/watch?v=4bM3Gut1hIk - Christopher Domas The future of RE Dynamic Binary Visualization
    (very interesting presentation)
Looks like there's even been a recently open sourced plugin for Ghidra released by Battelle:

  https://github.com/Battelle/cantordust




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: