Hacker News new | past | comments | ask | show | jobs | submit login

So ... the registers are dynamically allocated from a chunk of cache? Does this mean there, effectively, are no registers? Does this cache have one clock latency?



I doubt anyone will be able to answer questions this fine grained, not now (if the implementation is architecturally exposed - leaks into the ISA - and Asahi Linux group figures some of it out), or possibly not ever (if it's architecturally transparent and thus entirely micro-architectural).

> Does this mean there, effectively, are no registers?

I can only point out just for context that if by any chance you're asking whether the registers are implemented as actual hardware design "registers" - individually routed and and individually accessible small strings of flip-flops or D-latches - then the history of the question is actually "it never was registers in the first place" - architectural (ISA) registers in GPUs are implemented by a chunk of addressable ported SRAM, with an address bus, data bus, and limited number of accesses at the same time and limited b/w [1].

[1] see the diagram at https://www.renesas.com/us/en/products/memory-logic/multi-po...


Oh! Well, that explains that then. Wild!


There is a fairly informative survey on the subject: https://www.osti.gov/servlets/purl/1332070 (A Survey of Techniques for Architecting and Managing GPU Register File)


An easier to read research article that's narrower in subject and seemingly more relevant to the OP: https://research.nvidia.com/sites/default/files/pubs/2012-12... ("Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor", 2012)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: