Despite sounding very broadly useful, this documentation seems to go out of its way to avoid providing a list of what hardware you can target with this framework, aside from mentioning some of the other middleware this can interface with. I'd much rather see documentation that prioritizes the user's needs over Intel's corporate strategy policies.
Considering how poorly it seems to support cuda as a backend [0], I wouldn't hold my breath about non intel vendor support (amd cpu or gpu). If you want to try anyway, here's the benchmark suite [1] and I couldn't find sample results available.
As for less common gpus, there really is no good support in any library. If you ever want to go down a fun rabbit hole, try to use the gpu in a raspberry pi for something. You'll eventually find one guy who reverse engineered the drivers to make a compiler but that's it. That's also the story for most uncommon chips with gpus (amlogic, broadcom, etc) where the real vendor gives you a driver blob for linux and no documentation. On the other hand rockchip has some good things available [2].
Last month I had a problem becasue I don't know enough Python to be sure when something will be broadcasted and parallelized. So I had to make a few tries until it was faster. Do you have an example of an easy case were this is much faster?