What can each one of those 2.5 million "logic elements" do? Last time I used an FPGA, they were mostly made up of 4-bit LUTs.
How many NOT operations can this do per cycle (and per second)? I realise FPGAs aren't the most suited for this, but the raw number is useful when thinking about how much better the FPGA is compared to a GPU for simple ops.
The 2.5 million number quoted in the article is "System Logic Cells", not Logic Elements. Near as I can tell, since I haven't kept pace with Xilinx since their 7 series, a "System Logic Cell" is some strange fabricated metric which is arrived at by taking the number of LUTs in the device and multiplying by ~2. In other words, there is no such thing as a System Logic Cell, it's just a translucent number.
Anyway, the FPGAs being used here are, I believe, based on a 6-LUT (6 input, 2 output). So you'd get about 1.25 million 6-LUTs to work with, and some combination of MUXes, flip-flops, distributed RAM, block RAM, DSP blocks, etc.
Supposing Xilinx isn't doing any trickery and you really can use all those LUTs freely, then you'd be able to cram ~2.5 million binary NOTs into the thing (2 NOTs per LUT, since they're two output LUTs). So 2.5 million NOTs per cycle. I don't know what speed it'd run at for such a simple operation. Their mid-range 7 series FPGAs were able to do 32-bit additions plus a little extra logic, at ~450 MHz and consume 16 LUTs for each adder.
The metrics have gotten pretty opaque since the old days when an FPGA was a "sea of LUTs" all alike; modern ones include a ton of (semi-)fixed function hardware like multiply-accumulate blocks and embedded dual-port RAM. Even the LUTs themselves can be reprogrammed into small RAM blocks or shift registers, so counting "logic elements" is mostly a marketing exercise.
While yes the architectures have become more "CISC-like", they aren't particularly convoluted or opaque. It's pretty easy to describe the architectures and come up with numbers for them. Xilinx could literally just say, "1 Million 6-to-2 LUTs" and that would be entirely transparent and helpful.
So it's not so much changes in architecture that have given rise to the translucency of these numbers. It's a measuring contest between Xilinx and IntelFPGA who believe you need to present bigger numbers in marketing material to win engineers. I can't speak for other FPGA engineers, but personally it just frustrates me and wastes my time. I don't ever take those numbers at face value, and I wouldn't hire anyone who did. Xilinx is the worst offender here. At least IntelFPGA will often quote their parts both in transparent terms (# of ALMs) and useful comparisons (# of equivalent LEs). I've never seen them pull a completely made up "System Logic Cell" out of thin air.
How many NOT operations can this do per cycle (and per second)? I realise FPGAs aren't the most suited for this, but the raw number is useful when thinking about how much better the FPGA is compared to a GPU for simple ops.