In the year 2024, the field of Large Language Models (LLMs) experienced remarkable advancements, heralding a new era of innovation in AI. The landscape was reshaped by several key breakthroughs and trends that pushed the boundaries of what was previously thought possible.
**Surpassing GPT-4 Level Models**
By the end of 2024, a staggering 70 models from 18 institutions had outperformed the original GPT-4 on the ChatbotArena leaderboard. This year witnessed a significant leap in the capabilities of these models, with many now able to process long texts exceeding 100,000 tokens.
**Drastic Reduction in Training Costs**
DeepSeek v3 achieved comparable performance to models like Claude 3.5 Sonnet at a mere cost of 5.57 million dollars for training. Additionally, the cost of LLMs saw a dramatic decrease, with Google’s Gemini1.5 Flash 8B being 27 times more affordable than GPT-3.5 Turbo.
**The Rise of Multimodal LLMs**
Almost all major model providers introduced multimodal models capable of handling image, audio, and video inputs, greatly expanding the application scope of LLMs.
**Key Highlights of the Year**
**Breakthroughs in GPT-4 Level Models**
Google’s Gemini 1.5 Pro led the charge, becoming the first model to surpass GPT-4 with its video input processing capabilities. Anthropic’s Claude 3 series also pushed the envelope in terms of performance. Most mainstream providers now support the processing of over 100,000 tokens.
**Plummeting Model Prices**
OpenAI’s GPT-4o was available at an astonishing price of $2.50 per million input tokens, while Google’s Gemini 1.5 Flash was priced at just $0.075 per mTok, making it 27 times cheaper than GPT-3.5 Turbo.
**The Emergence of Multimodal LLMs**
The year saw virtually all prominent model suppliers launching multimodal models. Companies like OpenAI, Google, and Amazon introduced models that could handle images, audio, and video.
**Voice and Real-Time Video Interaction**
Innovations like voice and real-time camera modes in ChatGPT and Google Gemini enabled users to interact with models using voice and video.
**Application Generation Becomes the Norm**
LLMs could now generate complete interactive applications from prompts. Tools like Anthropic’s Claude Artifacts and GitHub Spark made this a reality.
**Additional Key Points**
– The era of free access to the best models was short-lived, with OpenAI introducing a paid subscription service, ChatGPT Pro.
– The concept of “Agent” remained nebulous, with its practicality being called into question.
– The importance of evaluation became more pronounced, as robust automated evaluation is key to building useful applications.
– The rise of reasoning models, such as OpenAI’s o1 series, opened up new avenues for enhancing model performance through reasoning.
As we reflect on the year 2024, it’s clear that LLMs have not only advanced in terms of capabilities but have also become more accessible. While the era of free access to top models has ended, and some concepts remain unclear, the focus on evaluation and the emergence of reasoning models signal a promising direction for the field of AI. It’s a time of both scientific rigor and humanistic endeavor, as we continue to unlock the potential of AI for the benefit of all.
—
### Format for WordPress Blog Post
In the year 2024, the landscape of Large Language Models (LLMs) underwent transformative changes, marked by several breakthroughs and trends that redefined the AI horizon.
By the close of 2024, a remarkable seventy models from eighteen institutions had surpassed the vaunted GPT-4 on the ChatbotArena leaderboard. This year was a testament to the enhanced capabilities of these models, now capable of processing lengthy texts beyond the 100,000 token mark.
A notable achievement was the drastic reduction in training costs. DeepSeek v3 matched the performance of models like Claude 3.5 Sonnet at a significantly lower cost of 5.57 million dollars. Furthermore, the affordability of LLMs was underscored by Google’s Gemini1.5 Flash 8B, which was 27 times cheaper than GPT-3.5 Turbo.
The year also witnessed the rise of multimodal LLMs. Virtually all leading model providers introduced models capable of handling image, audio, and video inputs, thereby broadening their application domain.
Among the key breakthroughs, Google’s Gemini 1.5 Pro took the lead in surpassing GPT-4 with its video processing capabilities. Anthropic’s Claude 3 series also excelled in performance. Most providers now support the impressive processing of over 100,000 tokens.
In terms of cost, OpenAI’s GPT-4o was offered at $2.50 per million input tokens, and Google’s Gemini 1.5 Flash at $0.075 per mTok, showcasing a remarkable drop in prices.
The advent of multimodal LLMs, along with voice and real-time video interaction capabilities in models like ChatGPT and Google Gemini, marked a significant shift in user interaction.
Application generation from prompts became standard, with tools such as Anthropic’s Claude Artifacts and GitHub Spark leading the charge.
However, the era of free access to top models was short-lived, with OpenAI introducing a paid subscription model. The concept of “Agent” remained ambiguous, and the focus shifted towards the importance of evaluation and the emergence of reasoning models.
As we look back on 2024, it’s a story of scientific advancement with a human touch, signaling a hopeful future for AI.