That's the great thing about a mix of C++ and Python - you can scale up, scale down, port as you wish, program quickly, program detailed, etc. Such questions are not even relevant, because it's clear that the languages will not limit you.
Lua actually fits his criteria quite well, IMHO. It's fast, written in fully compliant ANSI C for portability, and very explicitly designed to be embedded in C / C++. It's also been in use in industry (such as at Petrobras, a big Brazilian oil company) for over a decade, as well as numerous games (http://www.lua.org/history.html). While the language isn't hard real-time, you can control (and just pause, or completely replace) the garbage collector.
As per the OCaml example, you could either change what C type Lua uses for its primary numeric type (it uses doubles by default) or use C userdata (a garbage-collector-managed pointer to a C struct), since the collection of numbers probably has some algorithmically relevant characteristics beyond just being a huge set of numbers.
Actually, if I was concerned about scaling to a 20-core CPU[0], I probably wouldn't use anything mainstream. It depends on the exact problem, of course, but that sort of thing is generally easier in functional languages.
Many of the author's questions can be reduced to "does it have a good C FFI?", but I'm not sure what the overall point is, or if these are just generalized suggestions to think about before starting a project. I'd like to know what the author would use in this situation.
[0] AMD has announced a 12-core Opteron for next year, so this isn't far-fetched at all.
In my opinion, that would be a pretty poor decision. Most functional languages have not actually been tested on a 20 CPUs. You have no idea how optmized it will run there. You have no idea how it will scale.
You can't have strange hardware, and then decide to go with experimental software. Stateless programming can be done in any imperative language, and there are lots of languages that allow you to scale to 20 CPUs, but are very mainstream.
Parallel-Processing is not much easier in functional languages. But optimizing to place specific tasks on specific processors is much easier in low level languages. I'd say your design decision is wrong.
Not entirely. While I haven't done much with the JVM-based languages, I've spent a lot of time dabbling in obscure programming languages (it's a fun and often very educational way to procrastinate), and many language problems are fundamentally social: spotty-to-nonexistent documentation, tutorials or references that are three versions out of date, important libraries on websites that look like ghost towns ("last updated jan 17, 2004"), etc. can kill a language, or at the very least encourage the idea that it's abandoned. A language without a community of people to answer questions, etc. is going to be very hard to use.