The problem with this will be the overhead of transferring data to/from the FPGA, which once accounted for often causes doing the computation on the CPU to make more sense. It's obviously not a show-stopper, since GPUs have the same problem, but are still useful, but it's hard to find a workload that maps well to this solution.
In a DAW, accelerating a heavy VST plugin might make sense. But often those are amenable to being translated to GPGPU code already.
I guess the one place where GPGPU-based solutions wouldn't work, is when the code you want to accelerate is necessarily acting as some kind of Turing machine (i.e. emulation for some other architecture.) However, I can't think of a situation where an FPGA programmed with the netlist for arch A, running alongside a CPU running arch B, would make more sense than just getting the arch-B CPU to emulate arch A; unless, perhaps, the instructions in arch-A are very, very CISC, perhaps with analogue components (e.g. RF logic, like a cellular baseband modem.)
This is normally handled in emulation by putting the inner parts of the testbench (the transactors) onto the FPGA as well, to minimize the amount of data that has to be transferred between the CPU and the FPGA. If the FPGA is to be used as a peripheral, again a division of labor needs to be found that minimizes the amount of data that needs to be communicated. But if there is FPGA logic on the same chip as the CPU cores, the overhead can be greatly reduced, and we're seeing more of that now.