Color comes from the initial neural network step. Since skin color is relatively predictable from facial features (ex: nose width), it should be able to do reasonably well.
Really? With what accuracy? This is the kind of assumption that will get research groups into very deep water...
Just imagine the kind of CCTV usage being discussed elsewhere in this thread. But the neural network happens to have a wrong bias towards skin colour...
You're absolutely right to be concerned about this stuff, but be aware that it is generally acknowledged as a problem and that the "ethics of machine learning" is quite an interesting and active research topic.
Using image synthesis at all can't be used for up-rezing CCTV imagery, the output is a fabrication and the researchers have all said so. People imagining bad use cases shouldn't be relied on. ;) If an investigator used this to track down criminals, they are the ones getting into deep water and making assumptions.