Thanks! The general answer is that it depends on your model and on which FPGA platform we're talking about, but in a head-to-head benchmark test you'll find results in the ballpark of 2-10x CPU and 0.5-2x GPU. As you point out, the power and cost are big differentiators. The other thing to consider is (as another commenter mentioned) that usually inference on CPU or GPU will require you to do some model quantization or compression, which can degrade model accuracy. Tensil can give you a way around that dilemma, so that you can have great performance without sacrificing accuracy.
Hi, I'm curious what you mean about model quantization being necessary on CPU and GPU? They're not necessary by default, as openvino, tvm, tensorrt can run single-precision inference on most classic models quite fast? If you're reaching for very low power or ultimate perf, yeah you can downgrade to fp16 (well... Mixed precision) with NVIDIA tensor cores or avx512-fp16, or bf16 in some Intel vnni confs? Going to integer will give you more throughput too but it's not necessary. Even myriad-x is supposed to handle some kind of fp16 with the shave cores.
The only time I had to reach for quantized (integer) networks to do anything at all was inferencing on FPGAs. Are you targeting dsp slices by default or implementing full ieee754 floating point by default?
Are you saying that with Tensil you can run single precision non-quantized models with up to 2x gpu perf?
I probably misunderstood your last sentence, sorry.
Sorry if this was unclear - in a datacenter use case you are right, but for an edge deployment, you will usually need to quantize, prune or compress your ML model to get it working as fast as you'd like on a sufficiently small CPU/GPU. Compared with running your ML model unchanged on those platforms, Tensil can run with the performance ranges listed above. You can also quantize and use Tensil too!
Will do - as I mentioned in another comment, it can be a bit subtle to find an apples-to-apples comparison, but we'll soon add some cross-platform that we think are reasonable.