It's very hard to have 'proofs' about convergence of global optimization algorit...

It's very hard to have 'proofs' about convergence of global optimization algorithms, because to do so, you have to make assumptions about the smoothness of the fitness landscape, the type of minimas, etc, etc.. This means that when someone is interested in publishing a new optimization algorithm, they just cherry pick a subset of problems and show that their algorithm works better on those, when used with a given set of hyper-parameters.

In theory, the no-free-lunch theorem means that there is no optimal algorithm for an arbitrary fitness landscape, but in practice, almost every single optimization of actual interest is somewhat smooth.

Anyway, here is an interesting set of benchmarks by Andrea Gavana about different optimization algorithms: http://infinity77.net/global_optimization/ampgo.html. In my experience, as long as the fitness landscape is relatively smooth, multi-start gradient based methods tend to perform better, assuming you can get a somewhat decent approximation of the gradient (with AD, it's not too difficult).