The model generates a large number of solutions, then they filter those that actually compile and generate the right output when executed, then they cluster to select a few (<10 solutions) and submit them. They are not allowed to present too many attempts.
Ah, the paper describes a fixed method for the last selection step and also AI generated tests to reduce the results even more before that. Quite a bit better, even if the participation is still only simulated.
Here's a good analysis of the paper: https://www.youtube.com/watch?v=s9UAOmyah1A