I used to believe that, until I started to get into cryptocoin mining. There were algorithms that were specifically designed to be GPU resistant and they all were ported and saw significant gains. It was that experience that pushed me to learn how to program these devices.
The tooling isn't good enough yet, but there is no question in my mind practically everything will run on a GPU like processor in the future. The speedups are just too tremendous.
Intel knows this, which is why they purposely limit PCIe bandwidth.
>I used to believe that, until I started to get into cryptocoin mining.
99% of the apps we use are totally unlike cryptocoin mining. And the style they are written (and more importantly, their function) is even more GPU resistant.
>The tooling isn't good enough yet, but there is no question in my mind practically everything will run on a GPU like processor in the future. The speedups are just too tremendous.
Don't hold your breath. For one, nobody's porting the thousands of apps we already have and depend on everyday to GPU.
While true in the short term, it should be noted Intel is moving more towards the GPU model with its many slower core CPUs and very wide vector units. There is value in a large core capable of fast out of order execution for control purposes. But data processing can be done much faster with a GPU model.
You can even implement explicit speculative execution - simply use a warp for each path and choose at the end. It is very wasteful but can often come out ahead.
No, Intel's approach is very different from GPUs, because Intel has a strong negative interest in making porting to GPUs easy (also, wide vectors are far easier to do in a CPU than a "GPU like model").
Cryptocoin mining is embarrassingly parallel by its nature. You are trying lots of different inputs to a hash function, so you can always run an arbitrary number of them in parallel. There are various ways to reduce the GPU/FPGA/ASIC advantage, like requiring lots of RAM, but the task is still parallel if you have enough RAM. Something like a JavaScript JIT on the other hand is fundamentally hard to parallelize.
Primecoin is a good example. If you look at the implementation it requires generating a tightly packed bitfield for the sieve and then randomly accessing it after. Lots of synchronization required so you don't overwrite previously set bits are required and the memory accesses are random so its suboptimal for the GPU's memory subsystem.
It took under a year for a GPU miner to come out. Having optimized it for Intel in assembly I was convinced it wasn't possible for a GPU to beat it - and yet it happened.
It turns out even when used inefficiently a thousand cores can brute force its way through anything.
But you don't have thousands of cores. You have a medium amount of cores (64 in the latest Vega 64 GPU) with a high amount of ALUs (and threads) per core. When the GPU executes an instruction for a thread it looks if the other threads execute the same instruction and then utilises all ALUs at once. This is great for machine learning and HPC where you often just have a large matrix or array of the same datatype that you want to process but most of the time this isn't the case.
> a thousand cores can brute force its way through anything
This is so fundamentally wrong that your complete lack of understanding the underlying math is obvious. Let us use 1024bit RSA as the key to bruteforce. If we use the entire universe as a computer, i.e. every single atom in the observable universe enumerate 1 possibility every millisecond, it would take ~6 * 10^211 years to go through them all. In comparison, the universe is less than ~14 * 10^9 years old.
And this is for 1024bit, today we use 2048bit or larger.
Except you're not bruteforcing through all 2^1024, you're factoring a number, which is much easier and why rsa 768 is broken and rsa 1024 is deprecated.
Do you know how to bruteforce a prime factorization? Because I'm pretty sure from your comment that you don't. The calculation is based on enumrating all prime pairs of 512bit length or less each (the length of the indivitual primes may be longer, but for napkin math it's a very good approximation).
That is bruteforcing. That faster and better methods exists is irrelavent for a bruteforcing discussion, but I do mention in the very next message in that thread that they exist.
XPM has nothing to do with finding primes used in RSA, I'm not sure where that came from. It's POW is finding cunningham chains of smaller sized primes. That said I wouldn't be surprised if it could be adapted to find larger primes on a GPU significantly faster than a CPU.
It was simply a response to your statement about bruteforcing anything.
There is some ways to factor a prime that is much faster than bruteforcing, and yes they work on a GPU. But that has nothing to do with bruteforcing, but clever math.
The statement was stating an absolute. Algorithmic complaxity matters more than the speed of the computing hardware. Obviously, more power means you can compute more, but not everything.
That's a very uncharitable reading of what I wrote. The topic of the conversation was GPU performance vs CPU performance. Despite being less flexible, the sheer quantity of execution units more than makes up for it.
But no, I suppose its more likely I was really saying GPUs aren't bounded by the limits of the universe.
The context is Primecoin and ability to find primes. Prime factorization is related to that, and it would be obvious to read your statement in absolut for that context and it's relation. At least I did.
The tooling isn't good enough yet, but there is no question in my mind practically everything will run on a GPU like processor in the future. The speedups are just too tremendous.
Intel knows this, which is why they purposely limit PCIe bandwidth.