The typical perceived use case is FHE in cloud providers providing highly sensitive compute in a multi tenant platform. In something like that full on hardware acceleration is key
Not really, even in those cases you need to understand that bits in memory (ie.. everything past (&including) the SoC's IO pins would never see the data unencrypted. any decryption is handled within working cache on soc with secure context (typically on a secure OS vm). I dont see why a bunch of compute code wont work there. in fact you could go a step further and encrypt compute instructions too. then assuming you know what you are doing and dont crash the compute node will have no idea what ops you are actually doing on what data.