MoE LLMs use several "expert" fully connected layers, which are routed to during the forward pass, all trained end-to-end. This approach can also work with black-box LLMs like Opus, GPT4 etc. It's a similar concept but operating at a higher level of abstraction.