If you want that many units I'd precompute the paths with Floyd-Warshall instead of doing anything on the GPU. Probably still need to invoke the GPU to handle billions of units in the first place, though, and you'd need several gigabytes of RAM for them. Unless the entire game is happening in summary statistics.