MoE LLMs use several "expert" fully connected layers, which are routed to during...

		danlenton on May 22, 2024 \| parent \| context \| favorite \| on: Show HN: Route your prompts to the best LLM MoE LLMs use several "expert" fully connected layers, which are routed to during the forward pass, all trained end-to-end. This approach can also work with black-box LLMs like Opus, GPT4 etc. It's a similar concept but operating at a higher level of abstraction.