The problem with the read-everything-into-memory approach is that this is not how the JVM ecosystem works. Things written for the JVM use the classpath primarily, and things in memory secondarily if at all.
We can't control the fact that the CLJS compiler, for example, is looking for source files on the classpath instead of in some FileSet object proxy. If we admit the use of tools written by the Java community at large we suffer by adding another leaking half-abstraction to the mix.
We actually did some experiments with fuse filesystems but the performance is just not there yet. When fuse performance becomes comparable to Java NIO it may become a viable option, and would solve all of these problems. You could then have a "membrane" approach, where the JVM is only manipulating a filesystem proxy, and you have complete control over when and how to reify that and write to the filesystem.
Reading everything into memory wasn't supposed to be a complete solution. Not everything can be that simple. However, it seems to me to be better to start from a simple base and add complexity in as necessary, then start from a complex base and try to achieve simplicity.
But let's run with the idea of loading everything into some immutable in-memory data structure, just to see where it goes. So long as we write everything in Clojure we're fine, but the moment we start hitting things adapted for the JVM, such as the CLJS compiler, we run into problems as you point out.
However, it's not too hard to conceive of possible solutions. Let's start with a simple, but naive way around it. We'll take the files in memory, write them to a temporary directory, and then generate a CLJS compiler with a classpath pointing to that directory. When the compiler is done, we take the result and load it into memory again.
Again, this is solution that aims for simplicity rather than performance, but optimisations immediately suggest themselves. If the files exist on disk, we symlink them or point the classpath directly at them. If we don't need the CLJS output file's content, we can defer loading it into memory.
Haha, yes! Now we're cookin'! The "simple, but naive way" you describe above is pretty much the way boot does things. I'd say you could look at the boot cljs task to see this but setting up the environment for the CLJS compiler is pretty tricky so the code there isn't as clear and elegant as I'd like.
In boot tasks don't fish around in the filesystem to find things. Not for input nor output. Tasks obtain the list of files they can access via functions: boot.core/src-files, boot.core/tgt-files, et al. These functions return immutable sets of java.io.File objects. However, these Files are usually temp files managed by boot.
Boot does things like symlinking (actually we use hard links to get structural sharing, and the files are owned by boot so we don't have cross-filesystem issues to worry about), and we shuffle around with classpath directories and pods.
So stay tuned for the write-up of the filesystem stuff, I think it might be right up your alley!
It sounds like there's a lot in Boot I'd like, particularly in how it deals with the filesystem. I'm still not convinced about the design, but it's clear I don't know enough about it to make a decision on it.
If nothing else, I'm sure there will be parts in it I'll want to steal ;)
We can't control the fact that the CLJS compiler, for example, is looking for source files on the classpath instead of in some FileSet object proxy. If we admit the use of tools written by the Java community at large we suffer by adding another leaking half-abstraction to the mix.
We actually did some experiments with fuse filesystems but the performance is just not there yet. When fuse performance becomes comparable to Java NIO it may become a viable option, and would solve all of these problems. You could then have a "membrane" approach, where the JVM is only manipulating a filesystem proxy, and you have complete control over when and how to reify that and write to the filesystem.