Just saw your edit re: model compression. One thing that Tensil can do is help you avoid the need to quantize or compress your model entirely! For example, we've found that using a 16-bit fixed point numeric data type preserves almost all the model accuracy while not sacrificing performance thanks to the huge amount of parallelism available on FPGA.
The broader point is that Tensil is extremely flexible, so you can try out lots of different accelerator configurations to find the one that works best for your ML model. Think of it as optimizing the hardware first, then the software if needed.
We're actually working on a tool to manage and automate this hardware architecture search - watch this space!
The broader point is that Tensil is extremely flexible, so you can try out lots of different accelerator configurations to find the one that works best for your ML model. Think of it as optimizing the hardware first, then the software if needed.
We're actually working on a tool to manage and automate this hardware architecture search - watch this space!