I would've expected an answer involving "an exhaustive suite of test cases still passed" - "it looks right" is a low bar for any complex software project these days.
It's the long, long, long tail of edge cases - not just porting them, but even identifying them to test - that slow or doom most real-world human rewrites, after all.
Because you can read the test suite to check what it's testing, then break the implementation and run the tests and check they fail, then break a test and run them and check that fails too.
You have to review the code these thing write for you, just like code from any other collaborator.
It's the long, long, long tail of edge cases - not just porting them, but even identifying them to test - that slow or doom most real-world human rewrites, after all.