Because garbage collection, and in particular tracing garbage collection, adds significant overhead both in CPU cycles and memory. This overhead is also very unpredictable and depends heavily on memory allocation and object lifecycle patterns. Simple GCs can pause the program for a very long time, proportional to the size of the used memory, and this may be several tens of seconds for large heaps, so quite unacceptable. There are ways to mitigate these long pauses with incremental or concurrent GC, but they increase complexity of the runtime system and have even more average overhead, and although in the average case they may perform acceptably, they tend to have very complex failure modes. In addition to that, a tracing GC typically needs some additional memory "room" to operate, so programs using GC tend to use much more memory than really needed.
There is also a common misbelief that compacting GC helps make heap allocations faster than malloc. While technically true - the allocation itself is simple and fast, because it is only a pointer bump, a problem occurs immediately afterwards - this new heap memory hasn't been touched since the last GC, and it is very likely not cached. Therefore you get a cache miss immediately after the allocation (managed runtimes initialize memory on allocation for safety). Because of that, even allocating plenty of short-lived objects, which is the best case for GC, is not actually faster than a pair of malloc+free.
There are also other overheads:
* Managed runtimes typically use heap for most allocations and make stack allocation harder or not possible in all cases - e.g. it is much harder to write Java code with no heap allocations than C.
* To facilitate GC, objects need additional word or two words of memory - e.g. for mark flags or reference counts. This makes cache locality worse and increases memory consumption.
* During heap scanning, a lot of memory bandwidth is utilized. Even if GC does that concurrently and doesn't pause the app, this process has significant impact on performance.
* Tracing GC prevents rarely used parts of the heap to be swapped out.
There is also a common misbelief that compacting GC helps make heap allocations faster than malloc. While technically true - the allocation itself is simple and fast, because it is only a pointer bump, a problem occurs immediately afterwards - this new heap memory hasn't been touched since the last GC, and it is very likely not cached. Therefore you get a cache miss immediately after the allocation (managed runtimes initialize memory on allocation for safety). Because of that, even allocating plenty of short-lived objects, which is the best case for GC, is not actually faster than a pair of malloc+free.
There are also other overheads:
* Managed runtimes typically use heap for most allocations and make stack allocation harder or not possible in all cases - e.g. it is much harder to write Java code with no heap allocations than C.
* To facilitate GC, objects need additional word or two words of memory - e.g. for mark flags or reference counts. This makes cache locality worse and increases memory consumption.
* During heap scanning, a lot of memory bandwidth is utilized. Even if GC does that concurrently and doesn't pause the app, this process has significant impact on performance.
* Tracing GC prevents rarely used parts of the heap to be swapped out.