Processor: 32-bit PowerPC 450 at 850 MHz. The supercomputer has 294,912 processor cores. Instead of running it at a higher speed, they are quite correctly exploiting the parallelism. It's a net win for performance and energy efficiency. I guess that the desktops of the future will tend to follow a similar trend.
Well, yes and no. A games developer will do whatever they can to get realistic fluid simulation, for example. People already use more compute power for fog and smoke in games than was used on the weather 20 years ago...
Remember history. Anything people could do on the "mainframe", they eventually wanted to on the desktop.
I will like to believe that the Desktop software will move towards parallelism, may not be to the same extent as of HPC, but in a similar direction in the sense that it will certainly go away from strictly single core approach. It won't be easy, e.g. none of the web browsers today fully exploits all cores of a box. But, it will be fun to watch.
Off topic: I'm in your fan club, :) due to your "Hack the planet" blog http://wmf.editthispage.com since Dec. 1999.
The cost of finding all the single-thread bottlenecks and parallelizing them is immense -- akin to the Manhattan Project -- and the payoff would be that Intel and AMD could sell different (perhaps lower power) processors than they do today. Why bother? We have processors that work perfectly well for desktop software.
The problem with GPUs is that I/O latency is very high compared to your average supercomputer. You can do a crazy amount of computation locally on one card, but for problems that aren't "embarrassingly parallel", i.e. those that require a lot of low-latency inter-node communication, you'll immediately be limited by latency.
If nVidia or AMD release GPU based stream processors with onboard or daughterboard-based interconnects directly accessible from the code running on the GPU, THAT's when they'll start eating into CPU market share.
If you're buying a supercomputer, you'll want to make sure to spend at least 50% on the interconnect or you're in for a big disappointment.
Super computers are already focused on "embarrassingly parallel" problems. Otherwise 300,000 cores is not going to do much for you anyway. However, I agree that interconnect speed would be a major issue for many supper computer workloads. Yet, I suspect if you had access to a 10+million$ supercomputer built using 1million GPU cores plenty of people would love to work with such a beast.
No, these are not just racks and racks of individual machines. It presents the programmer with a single system image - it "looks" like one huge expanse of memory.
We have a Blue Gene at Argonne, it's not SSI. It is however not designed for embarrasingly parallel workloads, you use libraries like MPI to run tightly coupled message passing applications (which are very sensitive to latency). You can, and people have, run many-task type applications too.
The basic speed of light limitation means that accessing distant nodes is going to have high latency even if there is reasonable bandwidth. Ignoring that is a bad idea from an efficiency standpoint. And, unlike PC programming the cost of the machine makes people far more focused on optimizing their code for the architecture than abstracting the architecture to help the developer out.
It take care of it to some extent, but you still have to be aware of it as the programmer. MPI and associated infrastructure are set up such that they'll pick the right nodes to keep the network topology and your code's topology well matched. But you have to do your best as a programmer to hide the latency by spending that time doing other things.