A look inside the fastest supercomputer in Europe

khandekars · on June 14, 2009

Processor: 32-bit PowerPC 450 at 850 MHz. The supercomputer has 294,912 processor cores. Instead of running it at a higher speed, they are quite correctly exploiting the parallelism. It's a net win for performance and energy efficiency. I guess that the desktops of the future will tend to follow a similar trend.

wmf · on June 14, 2009

I guess that the desktops of the future will tend to follow a similar trend.

They won't, because desktop software is quite different from HPC software.

gaius · on June 15, 2009

Well, yes and no. A games developer will do whatever they can to get realistic fluid simulation, for example. People already use more compute power for fog and smoke in games than was used on the weather 20 years ago...

Remember history. Anything people could do on the "mainframe", they eventually wanted to on the desktop.

khandekars · on June 15, 2009

I will like to believe that the Desktop software will move towards parallelism, may not be to the same extent as of HPC, but in a similar direction in the sense that it will certainly go away from strictly single core approach. It won't be easy, e.g. none of the web browsers today fully exploits all cores of a box. But, it will be fun to watch.

Off topic: I'm in your fan club, :) due to your "Hack the planet" blog http://wmf.editthispage.com since Dec. 1999.

wmf · on June 15, 2009

The cost of finding all the single-thread bottlenecks and parallelizing them is immense -- akin to the Manhattan Project -- and the payoff would be that Intel and AMD could sell different (perhaps lower power) processors than they do today. Why bother? We have processors that work perfectly well for desktop software.

khandekars · on June 16, 2009

True. At the same time, we keep expecting more from our machines, fuelling feature growth and consequential demand for faster processing mechanisms.

wmf · on June 16, 2009

Yes, so future apps may prefer to run on the Larrabee cores, but the fat cores that old apps depend on cannot be removed.

Retric · on June 14, 2009

I can't help but wonder how much you could do with a super computer budget working with these (http://www.amax.com/CS_nVidiaTeslaDetail.asp?cs_id=PSC2) It works out to ~10$ / 1ghz core.

pmjordan · on June 14, 2009

The problem with GPUs is that I/O latency is very high compared to your average supercomputer. You can do a crazy amount of computation locally on one card, but for problems that aren't "embarrassingly parallel", i.e. those that require a lot of low-latency inter-node communication, you'll immediately be limited by latency.

If nVidia or AMD release GPU based stream processors with onboard or daughterboard-based interconnects directly accessible from the code running on the GPU, THAT's when they'll start eating into CPU market share.

If you're buying a supercomputer, you'll want to make sure to spend at least 50% on the interconnect or you're in for a big disappointment.

bravura · on June 14, 2009

How would the NVidia GPU personal supercomputer do on large matrix-matrix multiplication?

pmjordan · on June 15, 2009

Depends how large. If it fits in video memory, very well. If not, pretty badly.

Retric · on June 14, 2009

Super computers are already focused on "embarrassingly parallel" problems. Otherwise 300,000 cores is not going to do much for you anyway. However, I agree that interconnect speed would be a major issue for many supper computer workloads. Yet, I suspect if you had access to a 10+million$ supercomputer built using 1million GPU cores plenty of people would love to work with such a beast.

gaius · on June 15, 2009

No, these are not just racks and racks of individual machines. It presents the programmer with a single system image - it "looks" like one huge expanse of memory.

timf · on June 15, 2009

We have a Blue Gene at Argonne, it's not SSI. It is however not designed for embarrasingly parallel workloads, you use libraries like MPI to run tightly coupled message passing applications (which are very sensitive to latency). You can, and people have, run many-task type applications too.

Retric · on June 15, 2009

The basic speed of light limitation means that accessing distant nodes is going to have high latency even if there is reasonable bandwidth. Ignoring that is a bad idea from an efficiency standpoint. And, unlike PC programming the cost of the machine makes people far more focused on optimizing their code for the architecture than abstracting the architecture to help the developer out.

gaius · on June 15, 2009

Yes, the plumbing takes care of that for you. Oracle does similar tricks if you run it on NUMA hardware.

pmjordan · on June 15, 2009

It take care of it to some extent, but you still have to be aware of it as the programmer. MPI and associated infrastructure are set up such that they'll pick the right nodes to keep the network topology and your code's topology well matched. But you have to do your best as a programmer to hide the latency by spending that time doing other things.