An easier to read research article that's narrower in subject and seemingly more relevant to the OP: https://research.nvidia.com/sites/default/files/pubs/2012-12... ("Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor", 2012)