You can't use 10TFlops of compute on a GPU if you can't even feed it data quickly enough. The state of the art for throughput is Nvidia's NVLink and you're capped to a theoretical max of 160GB/sec. Given how trivial most analytics workloads are (computing ratios, reductions like sums, means, and variances, etc.) there's simply no way you're going to effectively max out the compute available on a GPU.
Searching sorted arrays is actually very common in these workloads. Why? Analytics workloads typically operate on timestamped data stored in a sorted fashion where you have perfect or near perfect temporal and spatial locality. Thus even joins tend to be cheap.
With Skylake and AMD Epyc nearing in on 300GB/sec and much better cost efficiency per GB of memory vs. GPU memory the case for GPUs in this application seems dubious.
I will grant you that GPUs have a place in more complex operations like sorts and joins with table scans. They also blow past CPUs when it comes to expensive computations on a dataset (where prefetching can mask latencies nicely).
A good example of a dense sort + join GPU workload would be looking for "Cliques" of Twitter or Facebook users. A Clique of 3 would be three users, A, B, and C, where A follows B, B follows C, and C follows A.
You'd perform this analysis by performing two joins: the follower-followee table on itself three times.
----------
So it really depends on your workload. But I can imagine that someone who is analyzing these kinds of tougher join operations would enjoy GPUs to accelerate the task.
Searching sorted arrays is actually very common in these workloads. Why? Analytics workloads typically operate on timestamped data stored in a sorted fashion where you have perfect or near perfect temporal and spatial locality. Thus even joins tend to be cheap.
With Skylake and AMD Epyc nearing in on 300GB/sec and much better cost efficiency per GB of memory vs. GPU memory the case for GPUs in this application seems dubious.
I will grant you that GPUs have a place in more complex operations like sorts and joins with table scans. They also blow past CPUs when it comes to expensive computations on a dataset (where prefetching can mask latencies nicely).