Python, at least, uses actual OS-level threads. However, it also uses a global interpreter lock (GIL), so only one thread can execute Pyton code at a time.
But when writing Python modules in C, you have control over acquiring and releasing of the GIL, so before starting some long running operation, you give up the lock.
Node, AFAIK, uses several OS-level threads under the hood for disk I/O. And with PHP, a web server probably will run multiple threads for handling requests concurrently.
So the impact might not be as big as for performance-oriented code in C/C++, but it is not necessarily nil, either.
With Python, numpy for instance will use multiple threads under the hood, even though the calling Python code might be single-threaded, and numpy's execution is completely unaffected by the GIL. Incidentally, to get Python code to run really fast, you'll have to offload most of the heavy lifting into libraries in any case. But it's still a Python program - in that most of the source lines, especially most of the source lines unique to the program as opposed to library code, will be still in Python, and in that in some cases you couldn't have written the program in a "more performant" language getting either the performance or the flexibility (Go for instance doesn't have operator overloading nor can it be used to implement linear algebra as efficiently as C/assembly; so a pure Go program doing linear algebra will be both slower and uglier than a Python program using numpy. A Go program using a C/assembly library will be very marginally faster than said numpy program, and just as ugly as the pure Go program.)
Also, in my understanding TFA applies to multiple processes just as much as multiple threads.
A recent alternative is to write the entire program in Julia. It is a lot less ugly than Numpy, and so performant that most code (other than steadfast libraries such as BLAS and LAPACK) are written in Julia itself.
http://julialang.org/
But when writing Python modules in C, you have control over acquiring and releasing of the GIL, so before starting some long running operation, you give up the lock.
Node, AFAIK, uses several OS-level threads under the hood for disk I/O. And with PHP, a web server probably will run multiple threads for handling requests concurrently.
So the impact might not be as big as for performance-oriented code in C/C++, but it is not necessarily nil, either.