The国产开源模型 DeepSeek-V3 has been officially released! 🎉
DeepSeek has just released the first version of the v3 model and made it available open source, sparking quite a bit of discussion. In particular, it is mentioned that its coding ability is on par with Claude Sonnet 3.5, which is currently recognized as the best in the industry.
**I. Performance Highlights**
1. Multiple evaluation results surpass other open source models and are on par with the world’s top closed-source models.
– DeepSeek-V3 is a self-developed MoE model with 671B parameters and 37B activations. It has been pre-trained on 14.8T tokens.
– Encyclopedic knowledge: The level of knowledge-based tasks has significantly improved compared to the previous generation DeepSeek-V2.5 and is close to the currently best-performing model, Claude-3.5-Sonnet-1022.
– Long text: In long text evaluations, the average performance surpasses other models.
– Code: In algorithmic coding scenarios, it is far ahead of all existing non-o1 models on the market; in engineering coding scenarios, it approaches Claude-3.5-Sonnet-1022.
– Mathematics: Significantly surpasses all open source and closed source models.
– Chinese language ability: Similar to Qwen2.5-72B in educational evaluations and the like, but more leading in factual knowledge.
2. Generation speed increased to 3 times.
– Through algorithmic and engineering innovations, the generation speed has increased significantly from 20 TPS to 60 TPS, bringing users a more rapid and smooth experience.
**II. API Service Price Adjustment**
With the update and release of DeepSeek-V3, the pricing for model API services has been adjusted to 0.5 yuan per million input tokens (cache hit)/2 yuan per million input tokens (cache miss), and 8 yuan per million output tokens. At the same time, a 45-day preferential price experience period is set. From now until February 8, 2025, the price is 0.1 yuan per million input tokens (cache hit)/1 yuan per million input tokens (cache miss), and 2 yuan per million output tokens. Both old and new users can enjoy the preferential prices.
**III. Open Source Weights and Local Deployment**
DeepSeek-V3 is trained in FP8 and the native FP8 weights are open sourced. SGLang and LMDeploy immediately supported native FP8 inference for the V3 model. TensorRT-LLM and MindIE implemented BF16 inference. A conversion script from FP8 to BF16 is also provided. For model weight downloads and more local deployment information, please refer to: https://huggingface.co/deepseek-ai/DeepSeek-V3-Base.
“Pursuing inclusive AGI with an open source spirit and long-termism” is DeepSeek’s firm belief. In the future, more rich functions such as deep thinking and multimodality will be continuously developed on the DeepSeek-V3 base model, and exploration results will be continuously shared with the community. 😊