If using Zig directly, then it would have to be in a similar manner as my C implementation, you have to compute the roots by hand.
Zig has extensive metaprogramming capabilities, so it should be possible to automate quite a bit of it. Particularly generating the "tracing/moving" function using comptime is trivial to do in Zig, in C you have to write it out by hand.
For a language like Zig (whose compiler you control), it's pretty easy. You check the types of every local variable in a function if they're GC pointers during typechecking. You store those variables in a function-local struct instead of directly as locals during codegen, which then gets pushed into a shadow stack as the first instruction the function does when called.
The shadow stack is just a thread-local linked list to these call-stack-structs.
When the function ends, it pops its struct from the shadow stack (no allocations needed, since everything is on the stack).
When a function calls collect, it'll traverse the shadow stack as the set of roots. The approach is explained in one of the papers I link to in the article, Accurate Garbage Collection in an Uncooperative Environment.
If you need to share data across threads, you'd need to use a separate shared heap for that data, which would be a lot more complicated, or just use some other memory management solution for that.
A very important point is not allowing libraries to call collect directly on the global heap, that defeats the point of leaving it up to the application if and when the GC gets called.