It's true that inference is still very often done on CPU, or even on microcontrollers. In our view, this is in large part because many applications lack good options for inference accelerator hardware. This is what we aim to change!
It depends on the application. For some use cases, moving to a GPU makes total sense. However, if you have power constraints, form factor constraints, performance constraints or simply want to be in control of your own hardware, using an FPGA with Tensil may be a better option.