Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here's an overly long reply, sorry :)

For custom pretty-printers, the long-term plan is to make the watch expression language rich enough that you can just write one-liners in the watches window to pretty-print your struct. E.g. `for entry in my_hashmap.entries_ptr.[my_hashmap.num_entries] { if entry.has_value { yield struct {key: &entry.key, value: &entry.value}; } }`. Then allow loading a collection of such printers from a file; I guess each pretty-printer would have a regex of type names for which to use it (e.g. `std:.*:unordered_(multi)?(set|map)`). There are not very many containers in standard libraries (like, 10-20?), and hopefully most of their pretty-printers can be trivial one-liners, so they would be easy enough to add and maintain that incompatibility with other debuggers wouldn't be a big concern. Currently nnd doesn't have anything like that (e.g. there are no loops in the watch expression language), I don't have a good design for the language yet, not sure if I'll ever get around to it.

(Btw, "pretty-printers" is not a good name for what I'm talking about; rather, it transforms a value into another value, e.g. an std::vector into a slice, or an unordered_map into an array of pairs, which is then printed using a normal non-customizable printer. The transformed value ~fully replaces the original value, so you can e.g. do array indexing on std::vector as if it was a slice: `v[42]`. This seems like a better way to do it than a literal pretty-printer that outputs a string.)

What kind of cooperation from library authors would help with container recognition... The current recognizers are just looking for fields begin/end (pointers) or data/len (pointer and number), etc (see src/pretty.rs, though it's not very good code). So just use those names for fields and it should work :) . I'm not sure any more formal/bureaucratic contract is needed. But it would be easy for the recognizer to also check e.g. typedefs inside the struct (I guess most languages have something that translates to typedefs in debug info? at least C++ and Rust do). E.g. maybe a convention would say that if `typedef int THIS_IS_A_VECTOR` is present inside the struct then the struct should be shown as a vector even if it has additional unrecognized fields apart from begin/end/[capacity]; or `typedef int THIS_IS_NOT_A_CONTAINER` would make the debugger show the struct plainly even if has begin+end and nothing else. That's just off the top of my head, I haven't thought in the direction of adding markup to the code.

A maintained collection of recognizers (in some new declarative language?) for containers in various versions of various libraries sure sounds nice at least in theory (then maybe I wouldn't've needed to do all the terrible things that I did in `src/pretty.rs`). But I don't want to maintain such a thing myself, and don't have useful thoughts on how to go about doing it. Except maybe this: nnd got a lot of mileage from very loose duck-typed matching; it doesn't just look for fields "begin" and "end", it also (1) strips field names to remove common suffixes and prefixes: "_M_begin_", "__begin_", "c_begin" are all matched as "begin", (2) unwraps struct if it has just one field: `foo._M_t._M_head_impl._M_whatever_other_nonsense._M_actual_data` becomes just `foo._M_actual_data`; this transformation alone is enough to remove the need for any custom pretty-printer for std::unique_ptr - it just unwraps into a plain pointer automatically. Tricks like this cut down the number of different recognizers required by a large factor, but maybe would occasionally produce false positives ("pretty-print" something that's not a container).

(Dump of thoughts about the expression language, probably not very readable: The maximally ambitious version of the language would have something like: (1) compile to bytecode or machine code for fast conditional breakpoints, (2) be able to inject the expression bytecode+interpreter (or machine code) into the debuggee for super fast conditional breakpoints, and maybe for debuggee function calls along the way, (3) have two address spaces: debuggee memory and script memory, with pointers tagged with address space id either at runtime or at compile time, ideally both (at compile time for good typechecking and error messages, at runtime for being able to do something like `let elem = if container.empty {&dummy_element} else {container.start}`; or maybe the latter is not important in practice, and the address space id should just be part of the pointer type? idk; I guess the correct way to do it is to write lots of pretty-printers for real containers in an imaginary language and see what comes up), (4) some kind of template functions for pretty-printing, (5) templates not only by type, but also maybe by address space, by whether the value's address is known (e.g. a debuggee variable may live on the stack at one point in the program and in register in another part), by variable locations if they're compiled into the bytecode (e.g. same as in the previous pair of parentheses), (6) use the same type system for the scripting language and the debugged program's types, but without RAII etc (e.g. the script would be able to create an std::vector and assign its fields, but it would be a "dead" version of the struct, with no constructor and destructor), (7) but there's at least one simplification: the script is always short-lived, so script memory allocations can just use an arena and never deallocate, so the language doesn't need RAII, GC, or even defer, just malloc. The design space of languages with multiple address spaces and tagged pointers doesn't seem very explored, at least by me (should look for prior art), so it'll take a bunch of thinking and rewriting. Probably the maximally ambitious version is too complex, and it's better to choose some simpler set of requirements, but it's not clear which one. If you somehow understood any of that and have thoughts, lmk :) )



I don't know if you have looked at LLDB but when it evaluates (non-trivial) expressions it does actually compile and link code into the inferior's address space. One of the major selling points when it came out was that you could write "real code compiled by a a real compiler (LLVM)" rather than whatever ad-hoc thing that GDB knows how to do. In theory this gave better support out of the box for things that can't be represented with pointer dereferences or whatever most debuggers support for their data visualization. The downside is that LLDB is extremely slow, and it still fails a lot when dealing with templated types because it will claim (whether honestly or not) that the specialization it wants is not present. And it doesn't look at your source code to generate a new one, which would be an excellent showcase of the LLVM stack, but I guess a bridge too far for a debugger :/

For your thing: I think you can get pretty far with what you're doing, but I do want to point out that just the standard types will probably work for Rust but in C++ ever nontrivial project has their own standard library. Most also hide their data behind a void *impl or whatever so no debugger knows how to deal with it out of the box. I don't expect you to parse the codebase for operator[] or whatever but I think you'd ideally want a simple DSL for building pretty printers, with maybe memory reads and conditionals, plus some access to debug info (e.g. casts and offsetof). I don't think that would be too awful for complexity or performance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: