> You can get a lot of the way (just not to `exp((E_old - E_new) / Temp)`) by explaining the origin of the algorithm
The exp is from the so-called Boltzmann factor [1] which (in somewhat simplified terms) describes the likelihood that a physical system in equilibrium at temperature T will transition from a state with energy E_old to one with energy E_new. So sure, you can derive the whole algorithm from classic "annealing" and physics alone.
I mean, I know where it comes from; I just don't think that there's a nice hand-wavy way to get people to understand what's going on there: you really need to start with how uncertainties multiply across a system, define entropy as log(W) in the NVE ensemble, discover temperature hiding there too, then switch to the NVT ensemble (i.e. connect to an infinite reservoir of energy at a certain temperature) -- then you can get at Boltzmann factors.
Even then, as was pointed out to me recently, it's not clear that the "always accept a lower energy, accept a higher energy proportional to the Boltzmann factor" yields a steady-state Boltzmann distribution -- especially because that would imply that the result is independent of the number of configurations with a given energy (the "density of states"), which seems surprising. If that doesn't hold then it's either a convenient approximation or a bit of magical thinking...
The exp is from the so-called Boltzmann factor [1] which (in somewhat simplified terms) describes the likelihood that a physical system in equilibrium at temperature T will transition from a state with energy E_old to one with energy E_new. So sure, you can derive the whole algorithm from classic "annealing" and physics alone.
1. http://en.wikipedia.org/wiki/Boltzmann_distribution