That's pretty close to how the first NVIDIA GPGPU (G80) worked. There was no cac...

That's pretty close to how the first NVIDIA GPGPU (G80) worked. There was no cache, but only different kinds of memory, of which one was as fast as registers. Optimal access to RAM memory was possible with a limited set of access patterns, so "caching" had to be managed carefully manually.

In later GPUs they (mainly due to programmer laziness :-) ) switched to a more GPU-like automatically managed general purpose cache. The problem was that in most cases, being able to quickly write code that is reasonably fast was more important to developers than being able to squish out every cycle (though I believe it's still possible to disable the cache and manage it manually...)

The problem with automatic cache is the same as all "intelligent" CPU features, the CPU tries to predict what the program is doing, the programmer has to predict what the CPU will predict to optimize for that CPU. In the next generation of the CPU, the CPU vendor will try to optimize again for programs currently around. In the end, optimizing becomes much more complex.

There's certainly an advantage to keeping the logic "dumb" and simply having the software manage everything.