My takeaway from this is that the GMKtec G2 is a real bargain for the price. Plus now that Gemma2 2.6B is a thing, it would make that dolphin-phi category slightly useful beyond just an acedemic exercise.
The Pi 5 would likely do pretty well on that list, I think I was getting around 3 tok/s for llama-3 8B, likely around 5 for dolphin-phi. The main issue with all of these is of course prompt ingestion since openBLAS is no cuBLAS and you sit around for 10 billion years until it starts to generate something.
I'm also curious though, what is the actual physical difference between the X4 and the G2 that accounts for almost double inference speed? They're both an N100, both use LPDDR5 and are both bottlenecked by a 32 bit single memory channel, both presumably use a single memory chip too. The G2 does come with the 12 GB version of it unlike the X4 since the 12/16 GB versions aren't out yet. Maybe those will do better somehow.
As for the N97, hopefully it's not affected by the current oxidation and overvoltage fiasco. The N100s are at least old enough to not be.
The Pi 5 would likely do pretty well on that list, I think I was getting around 3 tok/s for llama-3 8B, likely around 5 for dolphin-phi. The main issue with all of these is of course prompt ingestion since openBLAS is no cuBLAS and you sit around for 10 billion years until it starts to generate something.