Peeling Back the 12-Day Haze: Unveiling OpenAI’s True Progress

A few months ago, OpenAI embarked on a 12-day continuous release spree. Among these, the release on the ninth day was of utmost significance.

The overall content of the ninth-day release:
– The official version API of O1 (previously it was in preview).
– The Real time API now comes with an SDK, supports structured output, and can achieve a latency of less than 300 milliseconds. The cost has been reduced, dropping from a live streaming cost of around $50 per hour to a mini model cost of approximately $5 per hour.
– A new fine-tuning method called “preference fine-tuning” has been added.

The importance of structured output:
– For instance, if you want to dim the lights at home, the light can only receive structured instructions like JSON. AI can act as a translator, converting “Light, dim a bit” into structured instructions and relaying them to the light. This is the foundation for all agents.
– In last year’s version, there was no standard structured output solution from the official, and the Function Calling feature was unstable with a low success rate. In April this year, the success rate rose to 75.3%, and in May it reached 86.4%. With the new version in August, there is a standard interface for structured output. Under strict mode, it can achieve 100% output accuracy. Since then, many agent tools have emerged.

Structured output of O1:
– O1 is a powerful thinking tool. For practical scenarios such as mechanical control and IoT control, structured output is required. After this release, it can stably convert the thinking results of O1 into control instructions.

Real time API:
– Supports structured output with low latency and can output structured instructions. The reduced cost enables more people to use it for commercial purposes.

Multi-end to multi-end:
– Previously it was an “end-to-end” model, but now it has become a “multi-end to multi-end” model. The input can simultaneously contain multiple modalities such as files, text, voice, and video, and the output can also be text, voice, or even come with “Function Calling” instructions. This has a great role in teaching scenarios.

Preference fine-tuning:
– The official mentioned two types of fine-tuning. Regular fine-tuning tells AI “what I like”, while preference fine-tuning tells AI “what I like and what I don’t like”. This allows the model to actively avoid unwanted content during generation and is more stable.

These updates lay a technical foundation for the implementation of agents in various industries. Agents are no longer just “chatbots with prompts”, but can be integrated with various IoT or offline business systems and connect with the real world.

Behind the progress of agents is the improvement in the success rate of Function Calling. Many cool AI applications can be decomposed into a combination of several OpenAI APIs. Most of this year’s updates to various APIs and derivative applications from OpenAI revolve around structured output. In early last year, the first version of the structured output solution was released. In June, developers were invited to iterate on it. In November, “JSON mode” was released. This year, various APIs have been continuously upgraded around structured output, transforming from an unstable toy to a core factor influencing the real world and the developer ecosystem.