How Upcycling MoEs Beat Dense LLMs

Mathias
Nov 19, 2024

In this Arxiv Dive, Nvidia researcher, Ethan He, presents his co-authored work Upcycling LLMs in Mixture of Experts (MoE). He goes into what a MoE is, the challenges behind upcycling or scaling LLMs with a MoE architecture, and how upcycling MoEs outperform dense LLMs. Below are the recording of his presentation and his slides.

If you'd like to play around with MoE models, check out our Models Page.

Ethan is in our Oxen discord! To ask him any questions directly, join the community:)

MoE upcycling nov 15
MoE upcycling Ethan He 1 https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/core/transformer/moe