Maybe go all in on local on device low latency LLMs? They make their own hardware so they could probably do it and do it well with enough focus and investment.
8GB of RAM is not perfect for this. The M chips are only known for AI because unified memory allows you to get huge amounts of "VRAM", but if you're not putting absurd amounts of RAM on the phone (and they're not going to) then that advantage largely goes away. It's not like the NPU is some world-beating AI thing.
That seems like the opposite of what they were suggesting? Unless by “edge compute” you meant user’s devices, but I assume you intended the usual meaning of CDN edges.