I've repeatedly seen both high quality and low quality tooling act as massive force multipliers even for small teams; the question is just which direction you want to multiply. They say an army marches on its stomach—well, a tech effort marches on its tools. I interned at a trading company with, at the time, maybe 100 developers that built almost everything in-house and yet was quite a bit more successful and productive than much larger teams I've worked on since. The larger teams had incentives and structure to "focus" on what's "important" for very narrow notions of important... and so they ended up with lower-quality internal systems, a bunch of day-to-day drag on development and, both technically and organizationally, far less adaptability. (Turns out, one of the major advantages of investing in your own tools is that it builds up deep institutional knowledge about tooling that is hard-to-impossible to maintain otherwise.)
Only valuing easily-measurable work is, frankly, a modern organizational disease. It's like searching for the keys under the streetlight—and why? As a substitute for human judgement or resolving disagreements? As a way to make work more legible for executives? It leads to unforced errors and systemic problems, but we can't seem to do anything about it.
Recently I showed a development team how to use a debug tool that can automatically collect debug snapshots from production that can be opened with one click in the IDE. It’ll jump to the line of code and show the state of everything — not just the stack but the heap too.
My demo is to pick a random crash and diagnose the root cause while talking. The mean time to resolution is low single digit minutes.
I showed this to an entire team, one person at a time, solving bugs as I went. I showed the junior devs, senior devs, and their manager.
No interest. None. Just… silence.
The tools are amazing, but the lack of motivation from the typical developers for learning to use them is even more amazing.
Playing devils advocate here. It could just be that they already have the stack trace in logs somewhere, so how does this help you any faster? Bulk of the time to resolution will be figuring out what to do anyways.
These were all bugs that were nearly impossible to solve with just a stack trace. For example, there was a "format" exception on a web page with hundreds of uses of string formatting (to generate a report). Another example was a function call complaining about a null argument. Which instance? The page had 40+ calls to the same function, each with 6 arguments.
Most of these couldn't be reproduced either. As in, you'd get a crash once a day in a page that would otherwise work successfully thousands of times.
How would you fix a problem where there's a stack trace only from a release? The scenario is: you can't reproduce the errors, you don't get line numbers, you don't even get function argument values.
I could solve these in minutes using this tool. Could you match that without such tooling?
I’m guessing the organization incentives are against investigating/fixing random crashes. It might be unscheduled, or seen as “test” or “qa” or “ops” work. Try working with management to set up a way to reward the behavior you want to encourage.
Only valuing easily-measurable work is, frankly, a modern organizational disease. It's like searching for the keys under the streetlight—and why? As a substitute for human judgement or resolving disagreements? As a way to make work more legible for executives? It leads to unforced errors and systemic problems, but we can't seem to do anything about it.