Community - Practical ML Dives

A herd of enthusiastic oxen running towards the future

Practical ML Dives

Every Wednesday we host a "Practical ML Dives" where we implement the models we read about in the papers. This is a great way to cement the concepts in our brains, and spark ideas of what we can practically build with the technology as it stands today.

Practical ML Dives is meant to be a compliment to the paper deep dives. We will be taking the models we read about on Fridays, and implementing them in live running code on Wednesdays.

Training a Rust 1.5B Coder LM with Reinforcement Learning (GRPO)

Group Relative Policy Optimization (GRPO) has proven to be a useful algorithm for training LLMs to reason and improve on benchmarks. DeepSeek-R1 showed that you can bootstrap a mod...

Greg Schoeninger

3/6/2025

17 min read

🧠 GRPO VRAM Requirements For the GPU Poor

Since the release of DeepSeek-R1, Group Relative Policy Optimization (GRPO) has become the talk of the town for Reinforcement Learning in Large Language Models due to its effective...

Greg Schoeninger

2/6/2025

9 min read

ArXiv Dives: How ReFT works

ArXiv Dives is a series of live meetups that take place on Fridays with the Oxen.ai community. We believe that it is not only important to read the papers, but dive into the code t...

Greg Schoeninger

7/21/2024

10 min read

How to Train Diffusion for Text from Scratch

This is part two of a series on Diffusion for Text with Score Entropy Discrete Diffusion (SEDD) models. Today we will be diving into the code for diffusion models for text, and see...

Greg Schoeninger

4/30/2024

16 min read

ArXiv Dives: Text Diffusion with SEDD

Diffusion models have been popular for computer vision tasks. Recently models such as Sora show how you can apply Diffusion + Transformers to generate state of the art videos with ...

Greg Schoeninger

4/16/2024

11 min read

ArXiv Dives: The Era of 1-bit LLMs, All Large Language Models are in 1.58 Bits

This paper presents BitNet b1.58 where every weight in a Transformer can be represented as a {-1, 0, 1} instead of a floating point number. The model matches full precision transfo...

Greg Schoeninger

4/8/2024

9 min read

How to train Mistral 7B as a "Self-Rewarding Language Model"

About a month ago we went over the "Self-Rewarding Language Models" paper by the team at Meta AI with the Oxen.ai Community. The paper felt very approachable and reproducible, so w...

Greg Schoeninger

3/20/2024

17 min read

Practical ML Dive - Building RAG from Open Source Pt 1

RAG was introduced by the Facebook AI Research (FAIR) team in May of 2020 as an end-to-end way to include document search into a sequence-to-sequence neural network architecture. ...

Greg Schoeninger

1/6/2024

14 min read

Practical ML Dive - How to train Mamba for Question Answering

What is Mamba 🐍? There is a lot of hype about Mamba being a fast alternative to the Transformer architecture. The paper released in December of 2023 claims 5x faster throughput w...

Greg Schoeninger

12/21/2023

22 min read