I used to believe that, until I started to get into cryptocoin mining. There wer...

coldtea · on Aug 16, 2017

>I used to believe that, until I started to get into cryptocoin mining.

99% of the apps we use are totally unlike cryptocoin mining. And the style they are written (and more importantly, their function) is even more GPU resistant.

>The tooling isn't good enough yet, but there is no question in my mind practically everything will run on a GPU like processor in the future. The speedups are just too tremendous.

Don't hold your breath. For one, nobody's porting the thousands of apps we already have and depend on everyday to GPU.

slededit · on Aug 16, 2017

While true in the short term, it should be noted Intel is moving more towards the GPU model with its many slower core CPUs and very wide vector units. There is value in a large core capable of fast out of order execution for control purposes. But data processing can be done much faster with a GPU model.

You can even implement explicit speculative execution - simply use a warp for each path and choose at the end. It is very wasteful but can often come out ahead.

dom0 · on Aug 16, 2017

No, Intel's approach is very different from GPUs, because Intel has a strong negative interest in making porting to GPUs easy (also, wide vectors are far easier to do in a CPU than a "GPU like model").

TorKlingberg · on Aug 16, 2017

Cryptocoin mining is embarrassingly parallel by its nature. You are trying lots of different inputs to a hash function, so you can always run an arbitrary number of them in parallel. There are various ways to reduce the GPU/FPGA/ASIC advantage, like requiring lots of RAM, but the task is still parallel if you have enough RAM. Something like a JavaScript JIT on the other hand is fundamentally hard to parallelize.

IntelMiner · on Aug 16, 2017

"GPUs are only good at _very_ specific workloads."

Let me detail you a completely specific workload citing my own personal experience over objective facts with citations

Oh and let me also take a swipe at Intel again without any verifiable evidence

slededit · on Aug 16, 2017

Primecoin is a good example. If you look at the implementation it requires generating a tightly packed bitfield for the sieve and then randomly accessing it after. Lots of synchronization required so you don't overwrite previously set bits are required and the memory accesses are random so its suboptimal for the GPU's memory subsystem.

It took under a year for a GPU miner to come out. Having optimized it for Intel in assembly I was convinced it wasn't possible for a GPU to beat it - and yet it happened.

It turns out even when used inefficiently a thousand cores can brute force its way through anything.

imtringued · on Aug 16, 2017

But you don't have thousands of cores. You have a medium amount of cores (64 in the latest Vega 64 GPU) with a high amount of ALUs (and threads) per core. When the GPU executes an instruction for a thread it looks if the other threads execute the same instruction and then utilises all ALUs at once. This is great for machine learning and HPC where you often just have a large matrix or array of the same datatype that you want to process but most of the time this isn't the case.

purpleidea · on Aug 16, 2017

> a thousand cores can brute force its way through anything

Cool! Please factor this very large prime for me <insert some RSA prime>

slededit · on Aug 16, 2017

I can't tell what you are mocking...

That primecoin exists? That it's POW is finding prime numbers? That it has a GPU miner?

http://cryptomining-blog.com/2192-gpu-mining-for-primecoin-x...

hvidgaard · on Aug 16, 2017

> a thousand cores can brute force its way through anything

This is so fundamentally wrong that your complete lack of understanding the underlying math is obvious. Let us use 1024bit RSA as the key to bruteforce. If we use the entire universe as a computer, i.e. every single atom in the observable universe enumerate 1 possibility every millisecond, it would take ~6 * 10^211 years to go through them all. In comparison, the universe is less than ~14 * 10^9 years old.

And this is for 1024bit, today we use 2048bit or larger.

gruez · on Aug 16, 2017

Except you're not bruteforcing through all 2^1024, you're factoring a number, which is much easier and why rsa 768 is broken and rsa 1024 is deprecated.

hvidgaard · on Aug 18, 2017

Do you know how to bruteforce a prime factorization? Because I'm pretty sure from your comment that you don't. The calculation is based on enumrating all prime pairs of 512bit length or less each (the length of the indivitual primes may be longer, but for napkin math it's a very good approximation).

That is bruteforcing. That faster and better methods exists is irrelavent for a bruteforcing discussion, but I do mention in the very next message in that thread that they exist.

slededit · on Aug 16, 2017

XPM has nothing to do with finding primes used in RSA, I'm not sure where that came from. It's POW is finding cunningham chains of smaller sized primes. That said I wouldn't be surprised if it could be adapted to find larger primes on a GPU significantly faster than a CPU.

hvidgaard · on Aug 16, 2017

It was simply a response to your statement about bruteforcing anything.

There is some ways to factor a prime that is much faster than bruteforcing, and yes they work on a GPU. But that has nothing to do with bruteforcing, but clever math.

Dylan16807 · on Aug 16, 2017

The statement was about the relative speed of a thousand dumb cores vs. a couple really good cores. Not about absolute speed.

hvidgaard · on Aug 16, 2017

The statement was stating an absolute. Algorithmic complaxity matters more than the speed of the computing hardware. Obviously, more power means you can compute more, but not everything.

slededit · on Aug 16, 2017

That's a very uncharitable reading of what I wrote. The topic of the conversation was GPU performance vs CPU performance. Despite being less flexible, the sheer quantity of execution units more than makes up for it.

But no, I suppose its more likely I was really saying GPUs aren't bounded by the limits of the universe.

hvidgaard · on Aug 16, 2017

The context is Primecoin and ability to find primes. Prime factorization is related to that, and it would be obvious to read your statement in absolut for that context and it's relation. At least I did.