Tsinghua University and Mianbi Propose Innovative AI Agent Interaction: Capable of Active Thinking and Predicting Needs.

Tsinghua University and Mianbi Propose Innovative AI Agent Interaction: Capable of Active Thinking and Predicting Needs.

Even the most advanced AI agents like ChatGPT are traditional passive ones that execute tasks only when users give explicit instructions. Recently, a team jointly formed by Tsinghua University and Mianbi Intelligence has proposed a new generation of active agent interaction paradigm. Under this paradigm, agents are no longer instruction executors but intelligent assistants with “sensitivity”. They can actively observe the environment and predict users’ needs, achieving a leap from being “commanded” to “being able to think”.

The active agent interaction paradigm has rich application potential in daily life and can provide services such as travel arrangements, work assistants, life housekeepers, and health management according to users’ habits and preferences. For example, in a couple’s chat scenario, a boy invites a girl to Universal Studios on Saturday and will pick her up at 8 am. After being authorized, the agent is in a “standby state” and, by recognizing the girl’s needs through the context of the chat content, actively sets an alarm for 7 am on Saturday for her. When a user receives an important file, the agent actively saves it locally and renames the file according to the title on the first page of the PDF file.

In addition to proposing active agents, this research also constructs an environment simulator and a data set called ProactiveBench. By training models, a reward model highly consistent with humans is obtained, and the performance of different models under this data set is compared. The technical principle of active agents includes three components that simulate environmental information, user behavior, and feedback on tasks proposed to intelligent agents in different scenarios. The environment simulator simulates a specific environment and provides sandbox conditions for the interaction of intelligent agents. It uses real human data to improve the quality of generated events. Its main functions are event generation and state maintenance. The active intelligent agent predicts users’ intentions through the information of the environment simulator, generates predicted tasks, and proposes possible tasks combined with user feedback and historical interaction information. The user agent simulates user behavior and provides feedback on the tasks of the active intelligent agent. It is composed of prompted GPT-4o. The research collects judgments through human annotators and trains a reward model to simulate this process.

The research proposes a set of measurement methods to measure the consistency between the reward model and human annotators, including four aspects: missed needs, silent responses, correct detection, and wrong detection, and calculates recall rate, precision, accuracy, and F1 score. The results show that existing models perform well in correct detection, but have poor performance in other indicators and tend to accept intelligent agent tasks. The model trained in this research has the best performance and is selected as the reward model for ProactiveBench. Through the reward model, the performance of active intelligent agents can be measured. Closed-source models tend to proactively propose tasks but cannot remain silent when users do not need help. Open-source models are better after being trained with the data set, confirming the effectiveness of the data synthesis pipeline. The research also conducts ablation learning, allowing the model to propose multiple tasks and receive feedback from the reward model. All model indicators show a significant increase, the false alarm rate decreases, and the accuracy increases but the recall rate decreases. Combined with the reward model, active intelligent agents can better detect users’ needs and reduce the false alarm rate.

The active agent paradigm proposed in this research is expected to transform AI from a passive tool into an intelligent collaborator with insight and active help, opening a new paradigm of human-computer interaction and creating a more inclusive and convenient intelligent living environment for the public. Looking forward to more natural human-computer collaboration modes, more intelligent scene adaptation capabilities, and deeper personalized services in the future.