Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The Phi was an interesting computer. AVX512 on 60 cores back in 2015 was pretty nuts. CUDA wasn't quite as good as it is today (there have been HUGE advancements in CUDA recently).

These days, we have a full-fat EPYC or Threadripper to use, and even then its only 256-bit vector units. CUDA is also way better and NVidia has advanced dramatically: proving that CUDA is easier to code than people once thought. (Back in 2015, it was still "common knowledge" that CUDA was too hard for normal programmers).

Intel's Xeon Phi was a normal CPU processor. It could run normal Linux, and scale just like a GPU (Each PCIe x16 lane added another 60 Xeon Phi cores to your box).

It was a commercial failure, but I wouldn't say it was worthless. NVidia just ended up making a superior product, by making CUDA easier-and-easier to use.



I was using CUDA heavily in 2015, and I also looked at the first/second gen of the Xeon Phi at the time. I thought it was much harder to program for than cuda was at the time (and certainly that gap has widened). I recall things like a weird ring topology between cores that you may have had to pay attention to, the memory hierarchies (you kind of do this with CUDA, but I remember it being NUMA-like), as well as the transfers to and from the host CPU were harder/synchronous compared to CUDA.

It was definitely a really cool hardware architecture, but the software ecosystem just wasn't there.


Xeon Phi was supposed to be easy to program for, because it ran Linux (albeit an embedded version, but it was straight up Linux).

Turns out, performance-critical code is hard to write, whether or not you have Linux. And I'm not entirely sure how Linux made things easier at all. I guess its cool that you ran GDB, had filesystems, and all that stuff, but was that really needed?

---------

CUDA shows that you can just run bare-metal code, and have the host-manage a huge amount of the issues (even cudaMalloc is globally synchronized and "dumb" as a doornail: probably host managed if I was to guess).


That's right -- I always wished they made a Phi with PCIe connections out to other peripherals. Imagine a Phi host that could connect to a GPU to offload things it was better at.


That looks like 28 cores, and I think Phi went to 72 cores (or 144 with HT). Of course, the Phi was clocked much lower. The AMD is definitely more comparable.


Well... they did. That's basically called a Xeon 8180. :-)

Or alternatively, an AMD EPYC (64-cores / 128x PCIe lanes).


Now I'm remembering... They had the phi as a coprocessor in a PCI slot, effectively making it just as issue as a GPU. But the second gen (knights landing) made the phi the host processor, but removed almost all ability for external devices. It had potential I think, but it was a weird transition from v1 to v2.


I actually found the Coprocessor more interesting.

Yeah, NVidia CUDA makes a better coprocessor for deep learning and matrix multiplication. But a CPU-based coprocessor for adding extra cores to a system seems like it'd be better for some class of problems.

SIMD compute is great and all, but I kind of prefer to see different solutions in the computer world. I guess that the classic 8-way socket with Xeon 8180 is more straightforward (though expensive).

--------

A Xeon Phi on its own motherboard is just competing with regular ol' Xeons. Granted, at a cheaper price... but its too similar to normal CPUs.

Xeon Phi was probably trying to do too many unique things. It used HMC memory instead of GDDR5x or HBM (or DDR4). It was a CPU in a GPU form factor. It was a GPU (ish) running its own OS. It was just really weird. I keep looking at the thing in theory, and wondering what problem it'd be best at solving. All sorts of weird decisions, nothing else was ever really built like it.


Agreed! That's why I was bummed when the second-gen was a host system. Didn't fit well to my use case.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: