Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>>On Intel's part, it seems kinda... late?

Not exactly AVX512 is used on the Xeon Phi which is Intel's Floating Co-Processor for Deep Learning/OpenCL tasks. But that being said, GPU's blow the Xeon Phi out of the water in terms of FLOPS, local RAM, and RAM bandwidth.



Isn't Phi better at dealing with branch-heavy code? GPUs absolutely kill on straight-forward compute, but they seem to flail helplessly if the logic is highly conditional.

I've read that in some cases GPUs evaluate both branches of a condition and discard the unwanted results. A CPU doesn't do this.


>>Isn't Phi better at dealing with branch-heavy code?

This maybe true. Generally code that is infinitely parallelize to be ran on a GPU isn't going to be branch heavy. There are no benchmarks to support this, but I would believe it is true. GPU's typically have 1 branching unit per 64-128threads. While the Phi has 1 branching unit per 2 threads.

The real difference is GPU's use SIMT, while Phi uses MIMD. Each Phi thread is a POSIX thread. It can do its own thing, live its own life. While GPU's you just specify how parallel your work-group, thread block, warp, or wave front is (name depends on platform, in order OpenCL, CUDA, Nvida DX12, AMD DX12).

>>I've read that in some cases GPUs evaluate both branches of a condition and discard the unwanted results. A CPU doesn't do this.

Intel CPU's have done this since SSE4.


If you mean branch prediction, then yes, it will end up inadvertently executing the wrong instructions and discard them, but it halts execution of the wrong branch at the first opportunity and reschedules to correct its mistake.

A GPU will execute both branches to completion.


No, he means the SSE predication instructions.

The comparison instructions let you build a mask vector, and instructions like blendv use that mask vector to only impact a subset of elements.

It's been a common feature on many RISC and VLIW cpu's over the years. It is in no way unique to GPU's.


Ah, thanks for clarifying.


> GPU's use SIMT

I know very little of GPU architectures, but I tought that the last few generations of GPUs were straightforward SIMDs (i.e. all lanes are run in lockstep and divergence is handled at an higher level)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: