You can find the list of 2023 project allocations from the DOE INCITE program here [1]. INCITE allocates the majority of compute hours at the Leadership Computing Facilities, which includes the systems at Argonne and Oak Ridge.
Believe you’re thinking of the HPC facilities at LLNL and LANL. Frontier is run under DOE ASCR as one of the Leadership Computing Facilities (LCF), which focus mainly on open science use cases. Most of the compute hours in the LCFs is allocated via DOE INCITE and ALCC programs.
In some ways, #2 is sad to me.
Don't get me wrong, I have worked a ton on LLVM and love it.
But at least when i was there, IBM's interprocedural middle-end (TPO) was some of the nicest and well structured C++ compiler code i had seen in a long time.
It was well written, well commented, and well architected.
It may have changed since I left, but my understanding as of a few years ago was that IBM was replacing the front-end with Clang but continuing to use TPO (at least for now).
If by chance someone is looking for a VM management software around hyperkit, I've been working on https://github.com/bensallen/hkmgr in my spare time.
Also consider the ecosystem for these boards. Do you want your researchers dealing with the non-upstreamed support around Allwinner chips? Additionally consider time to solution, this machine is based on BitScope's Blade: http://bitscope.com/product/blade/.
32-bit vs 64-bit is largely inconsequential when the nodes only have 1-2GB of RAM. The ecosystem I'm referring to means you can run a vanilla Linux on a RPi with no extra work. You can google a problem and have a reasonable chance of finding a solution around the RPi and so on.
From what I understand you can see a decent improvement in speed by going 64bit on ARM because they used the crossover to dump a fair bit of legacy braindamage in the instruction set that was preventing them from making certain optimizations.
On the other hand, I look at the 3000 core figure and think that it's roughly on par with high end GPUs. The clock rates aren't terribly different either. The range of applications where this beats out GPU solutions is probably fairly narrow, especially given the terrible IO bottleneck on the RPis.
For comparison, a $7,500 TITAN X has 3072 CUDA cores clocked at 1Ghz. This cluster has 3,000 CPU cores clocked at 1.2Ghz. On the TITAN card all of those cores share the same 12GB of memory with 336.5GB/s of memory bandwidth. On the cluster every 4 cores shares 1GB of memory with (I think) 3.6GB/s bandwidth. Of course communication outside of those 4 cores is restricted to 0.0125GB/s at best.
for one, why are _researchers_ using largely obsolete technology; for another, many high performance computing tasks perform significantly faster on 64bit (e.g., lmdb)
Performance isn't actually the objective of the Pi cluster; the people using it have a real supercomputer next door. It's a testbed so they can validate programs before transferring them to the expensive supercomputer.
I would imagine going from a 10-node to 100-node system is more overall complicated than going from 32 to 64. Sure the instructions change, but that should basically be all abstracted away by the toolchain. However job management, allocation, data logistics, queues, cache invalidation, bottlenecks, etc, are all key issues that compound non-linearly with scale.
Chemists still use thin-layer chromatography, a technique 100 years old, day-in-and-out, in every lab in the world, even when HPLC and NMR exist. Why? It's cheap, fast, and works well enough.
Another neat use-case for this hardware is scale testing of systems management tooling, eg. provisioning, configuration management. HPC centers are looking at the possibility of having to manage 100k+ nodes in a single cluster.
- Gigabit Ethernet MAC on-board, Pine64's SOPINE and Raspberry Pi Compute Module both require per node networking components to be on the carrier board.
- Basic headers used for connecting to a carrier board
- Carrier board "only" needs an embedded switch and 5v power. However, I've not yet come across an embedded switch yet that has a non-blocking ratio of 1GbE ports to 10/25GbE uplinks.
- Carrier board should probably be mini-ITX or other standard form factor to fit in existing chassis. Form factor and embedded switch options are going to limit the number of nodes per carrier board.
I'm just looking at the friendly arm website at some of the other NanoPi's, which I'd not heard of before, I'm quite tempted by the Gbps ethernet. How do you find the software/OS support, are they pretty reliable, with longish uptimes?
i have some nanopi neo(1)s and they're rock solid - my rule of thumb with arm boards is that support is good enough if it has an armbian _debian_ (vs just armbian ubuntu) release
Cheers I might look at getting one then, as I had problems with my Pi's network connectivity being slow compared to a laptop's using the same ethernet over power line adapter.
no idea about PoE on either board but if i/o is a concern you might want to take a look at the $25 rock64 with gbe and usb3 - i/o on my neo1 maxes out somewhere just below 20mb/s but my rock64 has no problem doing 80mb/s with some ancient spinny disks and some armbian devs have done ~200mb/s with ssds iirc
os support on the rock64 is not quite there yet however but given the features@pricepoint i expect it to catch up
I've used some of their NanoPI M1 and other similar boards by other vendors (Orange PI PC and Zero) with great success, although not in any cluster configuration (mainly SDR and other "normal" stuff). The power/price ratio is generally more favorable compared to the RasPI boards, and there is some good information on the Armbian forums on how to get fine CPU throttling, which would probably be of interest in such a scenario.
I've also done three of these builds so far. Used the SC847A chassis with direct iPass cable access (i.e. no port multipliers), 4 drives per cable. Downside is you need 9x SFF-8087 connectors on controllers, and 9 iPass cables to somehow route. Don't get the SM iPass cables, TrendNet makes better ones. Upside is you have dedicated SAS2 bandwidth from the drive all the way though the controller and the PCIe bus. Likely overkill. Also a tip, you can mount 4x internal 2.5in or 2x 3.5in drives. SM has the part numbers for the brackets on the chassis' product page. Don't put anything you'd remotely want to hot swap in these brackets, they will be buried under the motherboard tray.
I've used 4x LSI Logic 9211-8i controllers plus the onboard of the SM X8DTH-6F. Both the onboard and the 9211-8i use the LSI 2008 chipset. I have Solaris and ZFS setting on top of these, so I don't have hardware raid.
This is actually very solid hardware so far. I had a PSU fail and thats it. Let me know if you have any questions.
The Canon PIXMA MX870 lets you scan and print over wifi. On a Mac you can even use Image Capture. It shows up under "Shared". I'm sure other Canon scanners have this feature as well.
[1] https://www.doeleadershipcomputing.org/wp-content/uploads/20...