Server Hardware Super-Cycle 2022

ksec · on Jan 18, 2022

Previous Discussion : https://news.ycombinator.com/item?id=29484567

My previous comment

Not just 2022, after being stuck with an 14nm node and old uArch from Intel. The Super Cycle will go on till 2025+ once you factor in all the upgrade from previous hardware and new capacity. PCI-E 6.0 was 0.9 in Oct and should have been 1.0 by now. I guess a small delay. We are looking at PCI-E 7.0 by 2023/2024. Intel is working their ass off to get back on track to their somewhat original roadmap. Lots of competition from AMD and more importantly ARM. We finally have node improvement after stuck with Intel 14nm on servers for 4+ years. TSMC 3nm in 2023 and 2nm by 2025. ( Expect a year later on Server ). DDR5 and DDR6 as well. 1 PB of SSD in 1U with Ruler form factor. And 800Gbps Network possible.

The biggest problem with all of these is that they are not getting cheaper per unit. DRAM / GB hasn't dropped at all in the past decade. The price floor is pretty much the same at $2/GB and much higher for high capacity and ECC memory. ( Price excluding inflation so you could argue it is cheaper ) NAND wont get much cheaper even with higher stacking. So all in all, you may be putting double the Storage, DRAM or Networking in a Server put they will cost double as well. Only thing that is getting slightly cheaper is CPU Cost per Core. Edit: And that is why this decade may be all about Software optimisation. There are many inefficiency lying around in our software stacks. While we have peaks everywhere in our Hardware roadmap. I would be surprised if we could even get 1TB Memory and 128 Core Server at half of today's price by 2030.

johnklos · on Jan 18, 2022

With the current prices and the lack of availability, the best 2022 server hardware for me will be 2018 server hardware.

The one thing that's relatively new-ish that I'd like to play with is persistent memory. But I'm curious about it for embedded applications where power is not assured. For servers? I don't understand the use case, because datacenters are where we have the most consistent power. Perhaps it'd be more interesting if I ran workloads on large datasets...

benlivengood · on Jan 18, 2022

Transactional workloads (databases, pubsubs, workflow coordinators) benefit the most from persistent memory because write latency to storage dominates performance (a transaction must be committed to permanent storage before the client can be informed of success, and if there is contention between transactions they must wait and proceed serially). Persistent memory is about 100 times faster (300ns vs. 30us latency) than NAND and is writeable in-place at byte granularity vs. multi-KB erasable blocks.

thesz · on Jan 18, 2022

300ns is a latency of PCIe, not DRAM.

DDR4 CAS latency is 15 clocks, which is about 9ns for highest clock frequency. Add to that cache latencies and memory controller latencies and you easily can get about 25-50 ns of latency and more. And to overcome that you have to use stream-friendly algorithms just as with SSD and spinning disks.

The technology of persistent memory is no different than NAND or whateger SSD tech de jour is used. This means that wear will be same as with other SSD drives and may be even worse due to different abmient temperature conditions.

Persistent memory is also more expensive.

The conflicting transactions must be ordered as if they were processed serially, the actual processing order can be different and their committment to disk also can be different than "commit transaction1 and then commit transaction 2". Having these two transactions commit at once or none at all is also permitted by serializable isolation order. Which is heavily used in any semidecent transactional storage engine.

If you go all-in to persistent memory, you lose cheaper multi-tier storage opportunity. You can use heavily read-optimized storage structures on the spinning disks, for example, by rearranging writes to these disks using SSDs and just plain old RAM. If I remember correctly, currently spinning disks are 5 times cheaper than SSD and you can either increase bandwidth or storage volume by using them.

menaerus · on Jan 19, 2022

> DDR4 CAS latency is 15 clocks, which is about 9ns for highest clock frequency. Add to that cache latencies and memory controller latencies and you easily can get about 25-50 ns of latency and more.

On my 3rd gen Xeon E5 NUMA machine (two nodes) accessing adjacent memory bank is roughly around 90ns under no load. Accessing non-adjacent memory was around 120ns.

> 300ns is a latency of PCIe, not DRAM.

I'd figure that this number improved across different PCIe generations or? Have you got any tool to recommend that can measure this kind of thing? I found https://github.com/andre-richter/pcie-lat but haven't tried it yet.

jeffbee · on Jan 18, 2022

> transaction must be committed to permanent storage before the client can be informed of success

Transactions must be durable, but that doesn't mean they can't be committed to RAM initially. There are large-scale database systems you have probably heard of that commit to RAM and get their durability from committing to multiple replicas in distinct failure domains.

SteveNuts · on Jan 18, 2022

Oracle Database supports halting the database state to persistent memory during OS reboot, basically you can pick up right where you left off when the machine comes back up.

pbalcer · on Jan 18, 2022

For anyone interested in how databases can benefit from PMem, I suggest watching this fantastic presentation from Oracle's Jia Shi "Under the Hood of an Exadata Transaction – How Did We Harness the Power of Persistent Memory".

https://www.youtube.com/watch?v=ertF5ZwCHP0

dragontamer · on Jan 18, 2022

I don't think Intel really knew what they were going for with Optane / persistent memory. They had an invention, it had a bunch of interesting stats, and they were hoping someone out there would figure out a use.

Databases seem like the obvious answer. Persistent memory / Optane is much faster than NAND Flash... but less dense than NAND Flash. Its also slower than DRAM, but more dense, so it sits in between.

Whether or not that's useful remains to be seen.

haimez · on Jan 18, 2022

You might use such a thing in database systems for, eg: 1. The write ahead log, for faster I/O than a direct flush to SSD 2. Staging disk cache entries to persistent memory before a planned host/process restart for faster subsequent DRAM cache warming

The problem with 1 is that you still need to periodically flush to disk (at least once persistent storage is full) which means it’s only an improvement in burst activity and not peak throughput.

The problem with both is complexity of implementation to deliver functionality that can only be used on a (currently) relatively uncommon subset of servers. Probably not something that Postgres would rush to implement, but potentially something that MSSQL or Oracle might consider as a premium feature.

Disclaimer: I haven’t looked into the details of optane and how you interface with that storage in software.

EvanAnderson · on Jan 18, 2022

I have some affection for the single-level store architecture of thr AS/400 (at least, on paper-- I've never coded for the platform). I wonder if persistent memory could be an enabler for single-level store architecture operating systems and applications on the PC platform.

spijdar · on Jan 18, 2022

I don't think hardware is the limiting factor -- OS/400 runs on a pretty standard RISC platform, same POWER chips that run AIX and Linux. Has some special tagged memory extensions, but nothing too radical, and not a hard requirement for a system like it.

The hard part is developing software to use it. I don't think you can really port existing OSes and software to use a single-level store environment. It's a different language than the one we speak. Seems this has been a general problem for persistent memory, to really use it 100% you need new software designed for it.

sidibe · on Jan 18, 2022

Persistent memory has some other advantages that make it good for datacenters. Mainly it is faster than SSD and cheaper than DRAM.

willis936 · on Jan 18, 2022

>hot-pluggable PCI-E devices

How is this possible? Hot plug connectors have uneven edge connectors so ground pins contact before signal pins or, at the very least, initialize links with in a high-Z state or have protection diodes. Did PCIe add any of these features?

throw0101a · on Jan 18, 2022

"PCI Hot Plug User’s Guide 2.3" by Fujitsu for their SPARC-based servers is copyrighted 2003:

* https://www.fujitsu.com/downloads/SFTWR/manual/s_e/b23pav1h0...

Being able to hot-plug cards in/out of slots has been a thing for higher-end gear for a long time.

I'm sure the IBM Mainframe folks did it many decades ago like they've seem to have done for many things that the 'commodity' server folks are only now catching up to.

rbanffy · on Jan 18, 2022

> I'm sure the IBM Mainframe folks did it many decades ago

They could do that with CPU modules a couple generations back. I don't think the newer ones can do it though, at least not a CPU at a time. It's probably possible to power off a CPU drawer (4 CPUs on the Z15, IIRC) without too much hassle.

I remember seeing a colleague of mine doing it with memory and CPUs on a live Ultra Enterprise 4000, again, with memory and CPU. IIRC, he had to tell the OS the board was going to be pulled out before doing so and a light would signal it was safe to proceed. He was doing that because we used one machine to process a lot of logs overnight (in 1997).

p_l · on Jan 18, 2022

CPU/memory live replacement in IBM mainframes is done on per-book basis IIRC. Used to be that single book had one or two CPU complexes (often each cpu was actually two running lockstep) but these days the density went way up.

justsomehnguy · on Jan 18, 2022

Hot-pluggableness is baked in PCI/PCI-E right in. I definitely hot plugged PCI cards on absolutely not hot-plug certified consumer motherboards. If you are not slamming it in it would work just fine. Hot unplug, on the other hand... Still, if you are on any Windows machine - just disable it in DevMgr, it would probably work just fine.

p_l · on Jan 18, 2022

PCI-E was hot pluggable since day 1, usually you have some extra bits and pieces to make it reliable though - depending on server it might involve switches or special card holders in case or card cages which you put the card in first. And usually some status leds and "attention, I want to remove this device" button.

A card cage design was used on some IBM machines, where you attach the card a slim sled with a lever attached, this allowed you to "slide in" the card into running computer then use the lever to push it into edge connector (this also handled powering up/down the connector, iirc).

Some servers have cases which instead implement that completely in the case itself, though usually that means you need more effort to do the replacement.

Then there are alternative PCI-E connectors - ExpressCard, U.2 & SATAExpress (used in SSDs), OcuLink and related (cabled PCI-E)

pbalcer · on Jan 18, 2022

It's been a feature for quite some time now.

https://www.youtube.com/watch?v=HkPSJc4Bi5o

jeffbee · on Jan 18, 2022

Not everybody uses the PCI Express physical slot that you may be familiar with from standard PC motherboards. For example, see the OCP 3.0 card standard. They are designed logically and electrically for hot plug.

That said, hot plugging is definitely a "pets" thing and as for large scale data center computing, nobody cares. The machine fails as a unit and it's OK to take a machine offline to fix it.

imglorp · on Jan 18, 2022

Even if it worked electrically, I'd be afraid to do that on commodity home hardware while powered. The motherboard might be held in space on standoffs, and you're putting a ton of force to jam that edge card connector in. If the flex induces things in sockets to wiggle around, or poor exercises any poor solder joints, you might introduce some transient errors.

Or I'm just paranoid. A ZIF PCI-E socket would be swell for this.

yjftsjthsd-h · on Jan 18, 2022

How's that different from putting the card in when the machine is off?

imglorp · on Jan 20, 2022

Causing random connections on your motherboard to flex thus making/breaking any flaky ones, you might introduce errors into your state.

Real server hotswap hardware is not putting any force on the motherboard; it goes into a caged connector.

paule89 · on Jan 18, 2022

I would find it interesting to see some old server hardware phased out and maybe purchase it for my home- office/lab. But I don't really know what to look out for, except for the gen4 hardware from 2017.

menaerus · on Jan 18, 2022

Unless you'd have a dedicated room for your server, I'd advise you to go for workstation machines. Pretty much the same horse power with more affordable hardware without big noise issues. Also, in comparison to v4, Xeons E5 v3 are ridiculously low priced right now (e.g. ebay) and are one heck of the CPUs too. In general there're no substantial differences between the microarchitectures of the two (Haswell vs. Broadwell).

dragontamer · on Jan 18, 2022

Agreed.

2U and especially 1U servers are much, much louder than normal computers. Shrinking parts means that the fans need to spin much faster to get sufficient airflow.

O5vYtytb · on Jan 18, 2022

Many consumer Ryzen motherboards (and all CPU) support unbuffered ECC dram which is a great thing to have in a server. I use a Ryzen 2600 in a consumer tower. It's quiet, low power, and like 1/10 the cost of real server hardware.

convolvatron · on Jan 18, 2022

they started this thought:

"Aside from DRAM expansion and smarter NICs, CXL also hopes to revolutionize how we view and share the memory pools already installed in your chips. Today, every device has its own address space. Transparently unified addressing is extremely slow for randomized memory patterns and causes unpredictable latencies. "

and never really finished it. does CXL include address mapping?