Ensure you're building with DWARF5, and enable accelerator tables like .debug_na...

Ensure you're building with DWARF5, and enable accelerator tables like .debug_names, which will allow debuggers to receive a pre-prepared index of the symbol names in the program (and thus it doesn't have to parse all the DWARF on startup).

Slow stepping is a surprise; there's no OS reason for that to be slower. Possibly if your types are really large and complicated, the debugger has to fetch a lot of data to refresh its view of state each time?