Locating on an echo at one more receivers rather than a perhaps quieter more direct path. Putting the wrong impulses together when computing a location. Computing the wrong start time for one or more impulses.
If you want to read up on all the ways this kind of thing can go wrong, the technical term here is "source localization."
The military has no problem detecting gunfire direction and estimating range from a single microphone array on a mast. Shotspotter is a distributed network of multiple sensors across a wide area. If they can't make it work after 15+ years then they are hopeless.
Boomerang is colocated with the target. Multipath is less likely to be an issue because there is likely to be line of sight to the source. Selecting the right impulses is easier because the spacing of the microphones provides some tight timing constraints. You’re not likely to see inconsistencies in timing the impulses lead to location error because any effects of the environment are likely to be consistent across the pulses. Finally, the location accuracy requirements are far different. ShotSpotter needs to put a dot on a map accurate enough to find shell casings or a victim in a wide geographic area. Boomerang just needs to give a good bearing from the receivers to return fire. The range doesn’t have to be all that precise, and the mathematics suggests it isn’t.
These are two very different approaches to solve different problems.
If you want to read up on all the ways this kind of thing can go wrong, the technical term here is "source localization."