So ... the registers are dynamically allocated from a chunk of cache? Does this mean there, effectively, are no registers? Does this cache have one clock latency?
I doubt anyone will be able to answer questions this fine grained, not now (if the implementation is architecturally exposed - leaks into the ISA - and Asahi Linux group figures some of it out), or possibly not ever (if it's architecturally transparent and thus entirely micro-architectural).
> Does this mean there, effectively, are no registers?
I can only point out just for context that if by any chance you're asking whether the registers are implemented as actual hardware design "registers" - individually routed and and individually accessible small strings of flip-flops or D-latches - then the history of the question is actually "it never was registers in the first place" - architectural (ISA) registers in GPUs are implemented by a chunk of addressable ported SRAM, with an address bus, data bus, and limited number of accesses at the same time and limited b/w [1].
There is a fairly informative survey on the subject: https://www.osti.gov/servlets/purl/1332070 (A Survey of Techniques for Architecting and
Managing GPU Register File)
An easier to read research article that's narrower in subject and seemingly more relevant to the OP: https://research.nvidia.com/sites/default/files/pubs/2012-12... ("Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor", 2012)