Dennard scaling for SRAM has certainly halted, as demonstrated by TSMC’s 3nm pro...

hajile · 2024-10-23T03:47:20 1729655240

Years ago.

DRAM uses a capacitor. Those capacitors essentially hit a hard limit at around 400MHz for our traditional materials a very long time ago. This means that if you need to sequentially read random locations from RAM, you can't do it faster than 400MHz. Our only answer here is better AI prefetchers and less-random memory patterns in our software (the penalty for not prefetching is so great that theoretically less efficient algorithms can suddenly become more efficient if they are simply more predictable).

As to capacitor sizes, we've been at the volume limit for quite a while. When the capacitor is discharged, we must amplify the charge. That gets harder as the charge gets weaker and there's a fundamental limit to how small you can go. Right now, each capacitor has somewhere in the range of a mere 40,000 electrons holding the charge. Going lower dramatically increases the complexity of trying to tell the signal from the noise and dealing with ever-increasing quantum effects.

Getting more capacitors closer means a smaller diameter, but keeping the same volume means making the cylinder longer. You quickly reach a point where even dramatic increases in height (something very complicated to do in silicon) give only minuscule decreases in diameter.

kstrauser · 2024-10-23T15:20:44 1729696844

What does “faster than 400MHz” mean in this context? Does that mean you can’t ask for a unit of memory from it more than 400M times a second? If so, what’s the basic unit there, a bit? A word?

I built a little CPU in undergrad but never got around to building RAM and admit it’s still kind of a black box to me.

Bonus question: When I had an Amiga, we’d buy 50 or 60ns RAM. Any idea what that number meant, or what today’s equivalent would be?

hajile · 2024-10-23T18:10:10 1729707010

The capacitors take time to charge and discharge. You can't do that more than around 400MHz with current materials. You are correct that it means you can't access the same bit of memory more than 400M/sec. This is the same whether you are accessing 1bit or 1M bits because the individual capacitors that make up those bits can't be charged/discharged any faster.

When we moved from SDR to DDR1, latencies dropped from 20-25ns to about 15ns too, but if you run the math, we've been at 13-17ns of latency ever since.

rasz · 2024-10-24T04:45:41 1729745141

Yep, we pretty much hit a wall during DDR2 rein https://en.wikipedia.org/wiki/CAS_latency#Memory_timing_exam...

anticensor · 2024-10-26T02:25:27 1729909527

No, 7.5 to 9.75ns during the DDR4 rein (DDR4-4266), according to that page.

rasz · 2024-10-26T06:31:05 1729924265

DDR2-1066 CL4 is also 7.5ns to first data.

anticensor · 2024-10-26T09:56:03 1729936563

But DDR2-1066 is worse in second and later accesses.

rasz · 2024-10-27T22:22:22 1730067742

Consequent accesses are measuring bus speed, not actual DRAM cell latency. At that point data is already loaded into a row of sense amplifiers.

aidenn0 · 2024-10-23T05:14:42 1729660482

If that's the case, why haven't we switched to SRAM? Isn't it only about 4x the price at any given process node?

RF_Savage · 2024-10-23T05:22:53 1729660973

That 4x the price does also explain why it has not happened.

aidenn0 · 2024-10-23T19:12:54 1729710774

If it were even 20% faster than DRAM, there would be a market for it at the higher price. The post I replied to was asserting that there was a physical limit of 400MHz for DRAM entirely due to the capacitor. If SRAM could run with lower latency, memory-bound workloads would get comparably faster.

guerrilla · 2024-10-23T08:00:55 1729670455

Yeah but why don't we have like 500MB of SRAM per 8GB of RAM. Certain addresses could be faster, in the same way /dev/shm/ is faster.

turtles3 · 2024-10-23T08:19:10 1729671550

This is sort of the role that L3 cache plays already. Your proposal would be effectively an upgradable L4 cache. No idea if the economics on that are worth it vs bigger DRAM so you have less pressure on the nvme disk.

guerrilla · 2024-10-23T09:24:44 1729675484

Cache is not general purpose as far as I know. I want to be able to do whatever I want with it.

myself248 · 2024-10-23T15:44:29 1729698269

Coreboot and some other low-level stuff uses cache-as-RAM during early steps of the boot process.

There was briefly a product called vCage loading a whole secure hypervisor into cache-as-RAM, with a goal of being secure against DRAM-remanence ("cold-boot") attacks where the DIMMs are fast-chilled to slow charge leakage and removed from the target system to dump their contents. Since the whole secure perimeter was on-die in the CPU, it could use memory encryption to treat the DRAM as untrusted.

So, yeah, you can do it. It's funky.

MichaelZuo · 2024-10-23T11:38:53 1729683533

Where’s the market demand for that?

guerrilla · 2024-10-23T16:53:53 1729702433

Where is the market demand for faster RAM? This isn't a good question.

MichaelZuo · 2024-10-27T12:26:12 1730031972

Gamers, 3D modelling professionals, IBM, HP, ORACLE, many IT departments, etc… ?

ghaff · 2024-10-23T12:31:57 1729686717

Yeah, you’re basically betting that people will put a lot of effort in trying to out/optimize the hardware and perhaps to some degree the OS. Not a good bet.

When SMP first came out we had one large customer that wanted to manually handle scheduling themselves. That didn’t last long.

guerrilla · 2024-10-25T11:19:15 1729855155

Effort? It's not like it's hard to map an SRAM chip to whatever address you want and expose it raw or as a block device. That's a 100 LOC kernel module.

crest · 2024-10-23T14:35:46 1729694146

AMD offers CPUs with over 768MiB of cache if you're willing and able to afford them.

rasz · 2024-10-24T05:03:59 1729746239

There used to be a DRAM with build-in SRAM cache called EDRAM (Enhanced DRAM, not to be confused with eDRAM Embedded DRAM).

• 2Kbit SRAM Cache Memory for 15ns Random Reads Within a Page

• Fast 4Mbit DRAM Array for 35ns Access to Any New Page

• Write Posting Register for 15ns Random Writes and Burst Writes Within a Page (Hit or Miss)

• 256-byte Wide DRAM to SRAM Bus for 7.3 Gigabytes/Sec Cache Fill

• On-chip Cache Hit/Miss Comparators Maintain Cache Coherency on Writes

Afaik only ever manufactured by a single vendor Ramtron https://bitsavers.computerhistory.org/components/ramtron/_da... and only ever used in two products:

- Mylex DAC960 RAID controller

- Octek HIPPO DCA II 486-33 PC motherboard https://theretroweb.com/motherboards/s/octek-hippo-dca-ii-48...

Salgat · 2024-10-23T03:29:58 1729654198

5nm can hold roughly a gigabyte of SRAM on a cpu-sized die, that's around $130/GB I believe. At some point 5nm will be cheap enough that we can start considering replacing DRAM with SRAM directly on the chip (aka L4 cache). I wonder how big of a latency and bandwidth bonus that'd be. You could even go for a larger node size without losing much capacity for half the price.

crest · 2024-10-23T14:43:41 1729694621

SRAM also requires more power than DRAM and the simple regular structure of SRAM arrays compared to (other) logic makes it possible to get good yield rates through redundancy and error correction codes so you could have giant monolithic dies, but information can't exceed the speed of light in a medium. There just isn't enough time for the signals to propagate to get the latency you expect of a L3 cache out of gigabytes (in relative terms) far away big dies containing gigabytes of SRAM. Also moving that the data would to perform computations without caching would be terrible wasteful given how much energy is needed just to move the data. Instead you would probably end up with something closer to the computing memory concept to map computation to ALUs close to the data with an at least two tier network (on-die, inter-die) to support reductions.

Salgat · 2024-10-23T17:31:51 1729704711

Oh yeah this would definitely be something like L4 cache rather than L3 like AMD's X3D cpus. The expectation is as an alternative to DRAM (or as a supplement), kind of like what Xeon Phi did.

bgnn · 2024-10-23T07:37:54 1729669074

5nm will never be that cheap. The performance benefit would be easily 2x or more though.

Salgat · 2024-10-23T17:33:08 1729704788

Even 15 years from now?

wmf · 2024-10-22T23:50:48 1729641048

Now? Prices have been flat for 15 years and DRAM has been stuck on 10 nm for a while.

philipkglass · 2024-10-23T00:22:29 1729642949

That's overstating the flatness of prices. In 2009, the best price recorded here was 10 dollars per gigabyte:

https://jcmit.net/memoryprice.htm

Recently DDR4 RAM is available at well under $2/GB, some closer to $1/GB.

jychang · 2024-10-23T02:13:25 1729649605

$1/GB? That's around the price SSDs took over from HDDs...

cubefox · 2024-10-23T11:24:57 1729682697

> Dennard scaling for SRAM has certainly halted, as demonstrated by TSMC’s 3nm process vs 5 nm.

I don't think the latter (SRAM capacity remaining the same per area?) has anything to do with Dennard scaling.

ksec · 2024-10-22T23:51:36 1729641096

Not soon as DRAM is mostly on older node. But overall cost reduction of DRAM is moving very very slowly.