They hot-swapped all kinds of model hyperparameters such as changing activation ...

They hot-swapped all kinds of model hyperparameters such as changing activation function and optimizer. It doesn't look like there was a principled reason why they kept switching optimizer or activation function. Maybe as they were training the model their data scientists kept finding ways to improve the model? Not sure, but it looks extremely hacky to me. Not something some team ran one day and forgot until it trained.