The other important bit here is garbage collection.
Local and anonymous functions that capture lexical environments really, really work much better in languages built around GCs.
Without garbage collection a trivial closure (as in javascript or lisps) suddenly needs to make a lot of decisions around referencing data that can be either on the stack or in the heap.
Yes, C++ is a great example of having to make decisions that don't have good solutions without a GC or something like. See mentions of undefined behaviour in relevant sections of the standard, i.e. when a lambda captures something with a limited lifetime.
Are you saying that Haskell doesn't have lexical environments? It very much does, just as all major languages of the ML language family do.