Green threads by definition cannot use OS stack and must allocate their stack memory on heap. Although this memory can be reused, as it is known from Go to avoid performance bottlenecks at least for Go code is better to allocate the stack as single continues block and copy the stack to a bigger block when thread’s stack reaches the current stack size. But then the whole stack space is pinned to the thread and cannot be reused.
For Java it may still be possible not to allocate the whole stack as a single chunk and instead have smaller chunks like one per few frames. But I really doubt that it can reduce memory pressure compared with CSP in real applications especially given how good GC became in Java.
So in Java we know a few things about the stack that are not true for other languages. We know nothing on the stack is a pointer into a Java stack frame, and nothing on the heap points into a Java stack frame. These facts allow us to mount virtual threads onto carrier threads by copying portions of the stack to and from the heap. This is normally less memory than you’d expect because although you might have a pretty deep stack most of the work will happen in just a few stack frames, so the rest can be left on the heap until the stack unwinds to that point.
The big advantage of this over CSP is that you can take existing blocking code and run it on a virtual thread and get all the advantages, there is no function colouring limiting what you can call (give or take a couple of restrictions related to calling native code).
I like CSP precisely because it requires to color-annotate the code so it is knows what can and what cannot do IO! Surely it decreases flexibility, but makes reasoning about and maintaining the code easier.
Thread stacks are not OS level objects, at least in linux you just malloc or anon-mmap some memory and pass that to clone() or you own green thread implementation.
The question is can unused potion of the stack be used for anything else? With native threads the answer is no and so is with Go green threads. Time will tell if Java can pull off the trick of sharing unused space place, but I am sceptical.
With POSIX threads the stack size defaults to something like 1MB or 2MB depending on the platform, but it's not allocated up front -- the stack grows as needed up to that maximum.
The main difference then between allocating stack chunks on the heap as needed, and stacks grown by the virtual memory subsystem, has to do with virtual memory management matters. If you can use huge pages for your heap, then allocating stack chunks on the heap will be cheaper than traditional stacks.
For Java it may still be possible not to allocate the whole stack as a single chunk and instead have smaller chunks like one per few frames. But I really doubt that it can reduce memory pressure compared with CSP in real applications especially given how good GC became in Java.