The soft prediction metric seems especially ridiculous to me. If I'm not mistake...

The soft prediction metric seems especially ridiculous to me. If I'm not mistaken, just picking at random gets better results than their ML selection at >= 5 predictions (1-(2/3)*5 > 0.8438).

However:

> your opponent's team is only partially known (you see their Pokemon species but not the moves, stat distributions, etc)

That's not true in the main competitive live format (e.g. NAIC 2025 which is the main case study here). These tournaments are "open team sheet", aka. moves, ability and held items are known (but not IVs/EVs).

I'm not sure whether this is the case on Smogon though, which means they might even be mixing two completely different datasets...