Researchers at Stanford and the University of Washington Trained a Model Similar to OpenAI’s o1 and DeepSeek’s R1

February 25, 2025

IBL News | New York

Researchers at Stanford and the University of Washington said in a paper released this month that they were able to train an AI reasoning model called s1, which performed similarly to OpenAI’s o1 and DeepSeek’s R1 on math and coding.

The s1 model, along with the data and code, is available on GitHub. According to the researchers, its training costs less than $50 in cloud computing credits.

This team started with an off-the-shelf base model and then fine-tuned it through distillation, a process for extracting the “reasoning” capabilities from another AI model by training on its answers.

The model was distilled from Gemini 2.0 Flash Thinking Experimental, offered for free via the Google AI Studio platform.

Distillation is the same approach Berkeley researchers used to create an AI reasoning model for around $450 last month.

OpenAI has accused DeepSeek of improperly harvesting data from its API for model distillation.

Distillation is a suitable method for cheaply re-creating an AI model’s capabilities, but it doesn’t create new AI models.

The s1 paper suggested that reasoning models can be distilled with a relatively small dataset using supervised fine-tuning (SFT), in which an AI model is explicitly instructed to mimic certain behaviors in a dataset.

More specifically, s1 was based on a small, free AI model from Alibaba-owned Chinese AI lab Qwen. To train s1, the researchers created a dataset of just 1,000 carefully curated questions paired with answers to those questions and the “thinking” process behind each answer from Google’s Gemini 2.0 Flash Thinking Experimental.

After training s1, which took less than 30 minutes using 16 Nvidia H100 GPUs, s1 achieved strong performance on specific AI benchmarks.

Per the paper, researchers used a nifty trick to get s1 to double-check its work and extend its “thinking” time: They told it to wait. Adding the word “wait” during s1’s reasoning helped the model arrive at slightly more accurate answers,

Experts said that s1 raises fundamental questions about the commoditization of AI models.

Latest News