16bit is sufficient for listening, but it leaves no headroom for production. Modern music is often recorded in 24bit and only rendered to 16bit as the final step.
That is correct for recording with microphones, however that is largely to avoid having to be overly precise with setting levels in order to maximise the value of those final 16 bits.
This doesn't analogise well to digital instruments, as the minimum and maximum velocity/amplitude/whatever is already known and defined. You already know what the exact potential dynamic range is before the first note is ever played. The concept of headroom doesn't exist. Anything that you can think of which you might define as "headroom" is either beyond the input and/or output capabilities of the device and therefore irrelevant, or it should be within the normal scale.
No, because digital processing adds distortion to the signal, which means that having an input beyond perceptual accuracy still matters to a production needing to apply lengthy effects chains. It just matters less than the case of a quiet signal recorded at a linear bitdepth.
For this sort of reason many digital effects operate at an oversampled rate(i.e. they apply a FIR filter to represent approximately the same signal at a higher sample rate like 1.5x or 2x, do the processing there, then downsample again). Contemporary DAW software has tended towards doing processing in 64-bit floating point, while source recordings typically go up to 24-bit/192kHz, which is probably "way more than good enough" even if some room to inflate the specs further exists.
For playback, 16-bit/44.1khz stereo is still as good as ever. You're limited more by the other parts of the signal path in that case.
This is concerning a velocity control signal, not a stream of audio samples.
This control signal would be mapped the internal processing of a synthesis or sampling engine, resulting in an audio signal whose data format (analog or 16/24/32-bit digital) comes down to the physical characteristics and design of the synth.
Once it leaves the synth, a host can convert to whatever it needs to have enough headroom for further processing.
This is orthogonal to the issues remediated by oversampling (harmonic artifacts due to aliasing in non-linear processing like saturation, not distortion issues related to bit-depth).
16-bit velocity signals, if linearly mapped, means 96dB of velocity sensitivity which is pretty good. And it won't be linearly mapped, because exponential velocity mappings are already prevalent in the industry.
I'm still having a hard time believing that the drum computer only having a 16-bit representation of the velocity of the stick hitting the pad will matter in its goal to produce a velocity-scaled (probably 24-bit) sample on its output port.
There's lots of way more interesting information to be represented that'd be more valuable than just a raw 24-bit velocity - the angle and location of the hit, how much pressure (not velocity) was being applied, how the stick moved while it was in contact with the skin, the material of the stick, etc., if you truly want to capture the dynamics of drumming.
Did you interpret the word “largely” to mean “entirely” by accident?
This post appears to be a generic diatribe on the facts about sound mixing. It does not appear to be a response to anything I wrote, which is not about sound mixing at all. My post was a rejection of the analogizing between MIDI data streams and sound recording.
Good luck finding any analog gear period that can saturate 24 bits and not have noise in the LSBs, it just doesn't exist. That's not the point of 24-bit audio, it's not needing to worry about clipping. That's why knocking it down to 16-bit for playback is practically speaking fine.