The hotswap is unfortunate, and likely a resukt of server/consumer space differe...

derefr · on May 14, 2020

Sounds like we might need to go back to the kind of mainframe architecture that has IO offload. Split the PCIe bus into NUMA-like zones; give each zone its own (probably ARM) CPU, running its own kernel; then use "application processors" (probably x86) to command-and-control the IO zones, allocating e.g. IOMMU-subvirtualized ethernet channels to them. Control plane/data plane separation.

magicalhippo · on May 14, 2020

There's some (to me) interesting work in this area. See for example this talk[1], where they show how a RISC-V CPU with a narrow and slow PCIe link can orchestrate the direct transfer of data between two PCIe devices (say NVME and Ethernet card), saturating the x16 link between them.

[1]: https://www.youtube.com/watch?v=LDOlqgUZtHE (Accelerating Computational Storage Over NVMe with RISC V)

pg-gadfly · on May 14, 2020

A bunch of less powerful servers with expansion cards over a network should be much easier to manage and horizontally scalable

simcop2387 · on May 15, 2020

sort of, the network part of that ends up being a huge bottleneck then too, with 16 drives at 5GB/s (max i've seen so far) each you've got 80GB/s you need for the network to each server. You start getting into the really expensive side of things speed wise.

extrapickles · on May 14, 2020

Also, most CPUs you can buy have around 40-64 PCIe lanes, limiting you to 10-16 drives if you want full speed out of them (this also leaves you with no lanes for ethernet).

fomine3 · on May 15, 2020

EPYC looking at you with 128 lanes