A herd of enthusiastic oxen running towards the future

Practical ML Dives

Every Wednesday we host a "Practical ML Dives" where we implement the models we read about in the papers. This is a great way to cement the concepts in our brains, and spark ideas of what we can practically build with the technology as it stands today.

Practical ML Dives is meant to be a compliment to the paper deep dives. We will be taking the models we read about on Fridays, and implementing them in live running code on Wednesdays.

How to Train Diffusion for Text from  Scratch
How to Train Diffusion for Text from Scratch

This is part two of a series on Diffusion for Text with Score Entropy Discrete Diffusion (SEDD) models. Today we will be diving into the code for diffusion models for text, and see...

Greg Schoeninger
Greg Schoeninger
Apr 30, 2024
16 min read
ArXiv Dives: Text Diffusion with SEDD
ArXiv Dives: Text Diffusion with SEDD

Diffusion models have been popular for computer vision tasks. Recently models such as Sora show how you can apply Diffusion + Transformers to generate state of the art videos with ...

Greg Schoeninger
Greg Schoeninger
Apr 16, 2024
11 min read
ArXiv Dives: The Era of 1-bit LLMs, All Large Language Models are in 1.58 Bits
ArXiv Dives: The Era of 1-bit LLMs, All Large Language Models are in 1.58 Bits

This paper presents BitNet b1.58 where every weight in a Transformer can be represented as a {-1, 0, 1} instead of a floating point number. The model matches full precision transfo...

Greg Schoeninger
Greg Schoeninger
Apr 8, 2024
9 min read
How to train Mistral 7B as a "Self-Rewarding Language Model"
How to train Mistral 7B as a "Self-Rewarding Language Model"

About a month ago we went over the "Self-Rewarding Language Models" paper by the team at Meta AI with the Oxen.ai Community. The paper felt very approachable and reproducible, so w...

Greg Schoeninger
Greg Schoeninger
Mar 20, 2024
17 min read
Practical ML Dive - Building RAG from Open Source Pt 1
Practical ML Dive - Building RAG from Open Source Pt 1

RAG was introduced by the Facebook AI Research (FAIR) team in May of 2020 as an end-to-end way to include document search into a sequence-to-sequence neural network architecture. ...

Greg Schoeninger
Greg Schoeninger
Jan 6, 2024
14 min read
Practical ML Dive - How to train Mamba for Question Answering
Practical ML Dive - How to train Mamba for Question Answering

What is Mamba 🐍? There is a lot of hype about Mamba being a fast alternative to the Transformer architecture. The paper released in December of 2023 claims 5x faster throughput w...

Greg Schoeninger
Greg Schoeninger
Dec 21, 2023
22 min read