In 2024: The Year of Large Language Models – We’re Becoming Desensitized to AI’s Growth
AI has reduced our sensitivity to technological progress, making everything seem to move faster. In our imaginations, technological progress should subtly change our lifestyles. However, the fervor around artificial intelligence often remains on platforms like Weibo and Zhihu. The general public, amid this seemingly detached commotion, is gradually becoming desensitized.
This phenomenon is particularly evident in the popularity of various topics in the AI field. Looking back at the whole year at the end of the year, it’s not hard to find that there are actually only two things that have sparked the most discussions: the attack on large language model training by ByteDance interns and the capital battle between Dark Side of the Moon and Zhu Xiaohu. But this is by no means the truest picture of China’s AI field. We can casually say that a certain AI function is “nothing remarkable” or a certain technological breakthrough is “just so-so”. But looking at 2024, this year is still an unadulterated technological whirlwind.
#01 Large Language Models Are More Practical but No Longer Astounding
At the beginning of 2024, the domestic large language model field presented a situation of “many contenders vying for supremacy.” As of April 2024, there have been 305 large language models coming into being. The “hundred-model battle” still applies today. However, the outbreak of price wars and the needs of application sides have effectively eliminated the vast majority of models that didn’t need to be born in the first place.
The first trend is small-parameter edge-side models. Large-parameter models possess powerful capabilities, but their training and invocation costs are high, making it difficult to popularize them when hardware capabilities are limited. The emergence of edge-side models makes it no longer distant for simple AI applications to enter daily life. The most typical cases are mobile phone/PC edge-side models, such as Xiaomi’s MiLM and vivo’s Blue Star large language model. They not only retain key capabilities on mobile phones but also reduce resource consumption. To a large extent, the deployment of such models has become a crucial step in AI’s penetration into daily life.
On this basis, another major trend is the application of Mixture of Experts (MoE) technology, a solution that makes model invocation costs lower while still remaining efficient. An ordinary large language model is like an omniscient expert who knows everything but is expensive (has high computing power requirements). On the other hand, an MoE model is like inviting a team of experts. These experts are respectively proficient in different fields, and users can mobilize the corresponding experts when needed. Through this mechanism, the computing power requirements and costs of the model are greatly reduced. Take Mixtral – 8x7B as an example. It has little difference in performance from GPT-4 but much lower resource requirements.
In addition, multimodal research has also begun to become an important direction in the development of large language models in 2024. Humans understand the world through multimodal ways such as vision, sound, and touch. If large language models want to truly possess intelligence and practical application value, relying solely on text input and output is clearly not enough. Taking generating accompanying pictures as an example, AI not only needs to understand the text content but also be able to grasp the context of images. With Google releasing the native multimodal large language model Gemini, multimodal capabilities have become the focus of research for major AI companies.
For ordinary users, there is no specific criterion for evaluating the quality of the answers provided by large language models. But the more content a large language model can read, the stronger it must be. In March this year, kimi of Dark Side of the Moon chose to take the path of “ultra-long texts.” Originally, when we wanted a large language model to read a book or a long article, we had to use various prompt words. But kimi directly increased the reading ability of the large language model to be able to handle a context length of 2 million, equivalent to three copies of “Dream of the Red Chamber.” Subsequently, kimi’s influence in China soared all the way.
The real “breakthrough point” in the large language model industry appeared in May 2024. Deepseek Company started a price war, and major companies like ByteDance and Alibaba quickly followed suit with price reductions. Baidu and iFLYTEK even launched free models. At the technical level, technologies such as model compression and mixed-precision training have helped manufacturers reduce training and invocation costs, thus providing room for price adjustments. At the market level, this price war is undoubtedly imitating the business model of the internet era. By reducing prices, companies can rapidly expand their market share. At the same time, manufacturers can also improve the training effect of their models by obtaining more user data.