I Read the DeepSeek Docs So You Don’t Have To

DeepSeek is turning heads in the AI world with two major innovations that flip the usual script for building AI models. Here’s the gist:

Skipping the Study Phase (Supervised Fine-Tuning)

When you train an AI model, the usual first step is something called Supervised Fine-Tuning (SFT). Think of it like studying for a test: you review labeled or annotated data (basically, answers with explanations) to help the model understand the material. After that, the model takes a “quiz” using Reinforcement Learning (RL) to see how much it’s learned.

DeepSeek figured out they could skip most of the study phase. Instead of feeding the model labeled data, they jumped straight to quizzing it over and over with RL. Surprisingly, the model didn’t just keep up—it got better. Without being spoon-fed, it had to “think harder” and reason through questions using the information it already knew.

The “think harder” part is key. Instead of relying on labeled data to tell it what the answers should be, DeepSeek designed a model that had to reason its way to the answers, making it much better at thinking.

This approach relied on a smaller initial dataset for fine-tuning, using only a minimal amount of labeled data to “cold start” the process. As the model answered quizzes during RL, it also generated a “chain of thought,” or reasoning steps, to explain how it arrived at its answers. With continuous cycles of reinforcement learning, the model became smarter and more accurate—faster than with traditional approaches.

By minimizing reliance on SFT, DeepSeek drastically reduced training time and costs while achieving better results.

Mixture of Experts (MoE)

Instead of relying on one big AI brain to do everything, DeepSeek created “experts” for different topics. Think of it like having a math professor for equations, a historian for ancient facts, and a scientist for climate data.

When DeepSeek trains or answers a question, it only activates the “experts” relevant to the task. This saves a ton of computing power because it’s not using all the brainpower all the time—just what’s needed.

This idea, called Mixture of Experts (MoE), makes DeepSeek much more efficient while keeping its responses just as smart.

What Does It Mean?

Using these methods, DeepSeek built an open-source AI model that reasons just as well as OpenAI’s $200/month product—but at a fraction of the cost.

Now, “fraction of the cost” still means millions of dollars and some heavy compute resources, but this is a big deal. DeepSeek has even shared much of their methodology and their models on Hugging Face, making it accessible for others to explore and build upon.

I’m still digging into what makes DeepSeek tick and experimenting with what it can do. As I learn more, I’ll share updates—so be sure to subscribe to the newsletter to stay in the loop!

Footnote: Further Reading

For those curious to dive deeper into the technical details of DeepSeek’s innovations, here are some resources I found useful:

DeepSeek R1: Reinforcement Learning in Action – VentureBeat’s take on how DeepSeek is challenging AI norms.
Mixture of Experts and AI Efficiency – A Medium article breaking down the MoE approach.
Meta Scrambles After DeepSeek’s Breakthrough – An overview of how DeepSeek’s advancements have shaken competitors like Meta.
DeepSeek R1 Technical Paper – The official documentation for DeepSeek’s R1 model, detailing its innovations.

Artificial Inteligence

Thoughts on Tech & Things

Jason Michael Perry

I Read the DeepSeek Docs So You Don’t Have To

Skipping the Study Phase (Supervised Fine-Tuning)

Mixture of Experts (MoE)

What Does It Mean?