That actually sounds realistic. A well defined test suite is probably a good target for AI. Main issue is no partial wins, which are needed for training, so you'd need to break all functionality down into absolutely minimal units.
Just rerun it with a lower target code complexity. You'll know you set it too low when it starts monkey-patching the test framework to avoid actually doing anything.
This can be avoided by optimizing for the simplest code that works. Checking many cases in multiple lines is always going to be longer and more complex than just a single operation on a single line.
Which is how AIs avoid overfitting, in general, by penalizing complexity.