Cost Under $150! Train a Reasoning Model in 26 Mins

Cost Under $150! Train a Reasoning Model in 26 Mins

For under 150 yuan, could one train a reasoning model that rivals the likes of DeepSeek-R1 and OpenAI’s o1? This is not a piece from洋葱 News, but a groundbreaking achievement by AI matriarch Fei-Fei Li and her team: the s1 model.

Cost and Training Time

The research indicates that the training process consumed less than 50 dollars in cloud computing costs. However, the team reveals that the computing resources needed to train s1 can now be rented for just about 20 dollars.

The Secret Sauce: Distillation

The s1 team attributes their success to one key technique: distillation. Starting with a foundational model, they achieved the s1 model through a process of distillation.

Data Set Creation

To train s1, the research team crafted a dataset comprising 1000 questions, each accompanied by answers and the model’s thought process.

Model Training

The training of s1 took a mere 26 minutes, utilizing a specific model and dataset.

Research Discoveries

The team uncovered that frequent suppression of thought can lead a model into a deadlock.

Model Performance

Here are some of the standout performances of the s1 model:

  • s1-32B achieved a score of 93.0 on MATH500, surpassing o1-mini and matching o1 and DeepSeek-R1.
  • On AIME24, s1-32B showed significant performance improvements, eventually leveling off.

Distillation Technique

The team started with the model from Alibaba’s Tongyi team and, through distillation, arrived at the s1 model. They introduced a new sequential Scaling method and corresponding Benchmark, along with a simple decoding intervention method called budget forcing.

Budget Forcing Method

This method intervenes in the model’s thought process by setting enforced limits on the number of thinking tokens.

Model Comparison

s1-32B was compared with other models on three reasoning benchmarks, demonstrating a different scaling paradigm and superior performance.

Conclusion

The s1 model is a testament to the power of distillation in enhancing the capabilities of smaller models on mathematical evaluation sets. The conclusions are as follows:

  • Budget forcing excels in control, scaling, and performance metrics.
  • s1-32B is the most sample-efficient open-source reasoning model to date.

With the magic of model distillation, the s1 team has once again, with a astonishingly low training cost, crafted a 32B reasoning model that stands tall alongside the top-tier reasoning models. Below are the insights into this remarkable achievement.

Below is the formatted content for a WordPress blog:

For under 150 yuan, a reasoning model named s1 has been trained to compete with the likes of DeepSeek-R1 and OpenAI’s o1, thanks to the pioneering work of researchers including Fei-Fei Li. Utilizing 16 NVIDIA H100 GPUs, the training was completed in just 26 minutes. The key lay in “distillation,” transforming a basic model into the sophisticated s1. A dataset of 1000 questions was created, each with detailed answers and thought processes.

Performance Highlights

s1-32B achieved a score of 93.0 on MATH500, comparable to o1 and DeepSeek-R1. The team’s discovery of the model’s tendency to enter a deadlock when thoughts are frequently suppressed is significant. The budget forcing method they introduced outshines in control, scaling, and performance. This research showcases the potential of distillation in the realm of model reasoning.