DeepSeek, founded by Wenfeng Liang, has made a splash with the launch of its R1 inference model, which rivals the performance of OpenAI’s o1 model. Here’s a response to the thoughts shared in “A Few Thoughts on Deepseek”:
DeepSeek’s Response
Regarding the facts:
– DeepSeek is not a repackaged or distilled version of U.S. models but is built upon innovative advancements in the Transformer architecture.
– The technical details mentioned, such as MoE, MLA, and MTP, are at the cutting edge of AI research.
– The reported training cost of $5.5 million is misaligned with the actual figures, indicating a significant discrepancy in cost calculation and actual investment.
Regarding the perspectives:
– DeepSeek symbolizes a victory for open-source over closed-source, yet the competition between the two is a marathon, not a sprint.
– OpenAI’s approach may be straightforward and aggressive, but it could widen the gap again in the future.
– While DeepSeek has matched open-source models to closed-source equivalents, caution is advised.
– The commercialization of foundational models is inevitable.
– The demand for computing power and data will not wane.
OpenAI’s Perspective
On the technical front:
– DeepSeek’s integration and optimization across multiple levels represent a practical innovation strategy.
– Comparisons of training costs should be interpreted with care.
– Training methods that eschew supervised fine-tuning are valuable research-wise but may have limitations.
On the market front:
– The collaborative nature of the open-source community is indeed a powerful force for rapid model improvement.
– The assertions about the enduring demand for computing power and data align with actual trends.
– The commercialization of models resonates with industry trends.
Summary
DeepSeek has achieved remarkable progress in the realm of open-source large models through innovative technology and engineering optimizations. However, the battle between open-source and closed-source is far from over, with the outcome hinging on technological advancements, commercial viability, and the speed of industry application adoption. Looking ahead to 2025, the commercialization of basic large models will accelerate, shifting the focus of competition from the models themselves to the deep integration of application scenarios and innovation in service models. The need for computing power and data remains unwavering.