Very interesting. I'd still like to see the JVM pick up the FPGA as a possible compile target, that way people could run apps that seamlessly used the FPGA where appropriate. I have mentioned this to Intel, who are promoting this technology (and also have a team that contributes to the JVM), but so far no one is stating publicly that they are working on such a thing.
An Intel VP mentioned it at JavaOne. He said they would provide FPGA support for OpenJDK. One central use case he mentioned would be big data & machine learning on Spark.
It was very pleasant surprise! The JVM world usually does not a have a great interface to the heterogenous world. I think it would yield tremendous benefits. FPGA-accelerated matrix multiplication, sorting, graph operations sound very appealing.
And then, as you mentioned, is the possibility of JITting things. http headers parsing ends up on the FPGA, and routes things to a message queue an actor can read. Or FPGA based actors; Does that make sense?
----
I have been unable to follow this development at all, however. Do you have any news about this project? I've been looking for a blog, a github or a mailing list, but can't find any.
Intel already have a compression library as a proof of concept that shows a large benefit. The JVM compiler knows A) how many instructions each method is and B) how CPU hot it is. Just with a compression library, the compiler could identify very hot and very small methods and test them on the FPGA, in parallel to normal execution, and measure the performance difference, and switch to the FPGA if it was beneficial (which may be for <1% of methods). I believe the JVM already has much of the infrastructure to do such parallel method tests.
That compression library offloads compression to a dedicated FPGA implementation of the compression algorithm, not a translated version of the same code that runs on the CPU.
Approaches do exist to automatically translate some types of code to run on an FPGA, though.
Seems as though you could use a tracing jit approach to offload hot loops to an FPGA very generically. Though it'd have to be a really long-running loop to outweigh the substantial overhead in synthesizing the logic gates.
You can't just find hotspots; you have to find hotspots whose data set is sufficiently disjoint between the FPGA and the CPU. With the right architecture (such as Intel's Xeon+FPGA package), you can have an incredibly fast interconnect, but it's still not the speed of the CPU's register file, so you can't hand off data with that granularity. You can get more than enough bandwidth, but the latency would crater your performance. You want to stream larger amounts of data at a time, or let the FPGA directly access the data.
For instance, AES-NI accelerates encryption on a CPU by adding an instruction to process a step of the encryption algorithm. Compression or encryption offloading to an FPGA streams a buffer (or multiple buffers) to the FPGA. Entirely different approach. (GPU offloading has similar properties; you don't offload data to a GPU word-by-word either.)
But even if you find such hotspots, that still isn't the hardest part. You then have to generate an FPGA design that can beat optimized CPU code without hand generation. That's one of the holy grails of FPGA tool designers.
Right now, the state of the art there is writing code for a generic accelerator architecture (e.g. OpenCL, not C) and generating offloaded code with reasonable efficiency (beating the CPU, though not hitting the limits of the FPGA hardware).
It's cool to know it's an area of active research. I wonder if there are also power consumption ramifications though. While e.g. AES-NI is incomparable performance-wise, my novice (perhaps incorrect) understanding is that ARM beats x86 power consumption by having a drastically simpler instruction set.
Could a simple ARM-like instruction set plus a generic "synthesize and send this loopy junk to FPGA" have power implications without a major performance impact on cloud servers? (Yeah I know this is likely a topic for hundreds of PhD theses, but is that something being investigated too?)