That's good to hear! Blackmar-Diemer is favoured by this ranking system as although it's a longish line, the opponent's move are all fairly probable, with the least probable being Qxd4 at still around 20%.
The longer lines are technically less probable, like you say - I was trying to capture the notion that a trap is more impressive if on average, moves are likely. How would you define 'sub-optimal' move in your proposal? It's a good idea -would be interesting to see it in action!
I can't think of a good way to do this purely based on move stats: I think you'd want to involve the engine. There's a Stockfish package for Python you could use, and it wouldn't be expensive or difficult to query the few evaluations you'd need. I would say a suboptimal move is when the evaluation changes by more than X, maybe 50 or 80 centipawns.
I tried out the project and sent you a couple of minor pull requests. I wanted to score my pet trap:
1. e4 c6 2. Nf3 d5 3. d3 dxe4 4. Ng5 exd3 5. Bxd3
after which the most popular moves Nf6 (trap score 34%) and h6 (trap score 29%) both lose immediately to Nxf7 (if Kxf7, Bg6+ and Qxd8 wins the queen). I've seen plenty of titled players fall for this. I think the correct way to score this trap is the sum of those trap scores, because the "trap" is set after Bd3, but there are multiple ways to fall into it. That would give it by far the highest score at 63%, but maybe changing the methodology this way would also increase the score of your other traps.