AI Personal Learning
and practical guidance

OpenAI Launches Operator, the First L3-Level Intelligence: Opening a New Era of Human-Computer Interaction

Computer The Use track is crowded with startups and dark horses, as well as giant bigwigs, and now OpenAI has killed it.

You can follow Desktop Automation Intelligence , we have collected dozens of related products.


 

Competition in the field of artificial intelligence is becoming increasingly fierce, not only start-up newcomers emerge, technology giants have also entered the game, and now, OpenAI has also entered the fray heavily. Recently, OpenAI officially released the intelligent body system called Operator, the first AI system that can operate a computer autonomously like a human, which is considered to be a key step for AI to move towards general artificial intelligence (AGI). As OpenAI president Greg Brockman predicted:

2025 will be the year of the intelligent body. We may be witnessing the birth of a "hybrid Internet" (HYBRID INTERNET) with the deep involvement of intelligent bodies. "

alt text

 

Operator: computer-using intelligences based on CUA models

Operator is a research preview product released by OpenAI, and its core technology is the Computer-Using Agent (CUA) model, which combines GPT-4o's visual capabilities and reinforcement learning techniques to enable it to analyze screenshots and interact with graphical user interfaces (GUIs) to simulate the use of peripherals such as keyboards and mice to perform a variety of complex tasks. keyboard, mouse and other peripherals to operate the computer and accomplish various complex tasks.

Unlike traditional AI systems that rely on pre-built APIs, Operator interacts directly with graphical user interfaces (GUIs) without the need to develop application-specific or website-specific APIs, which means that Operator can interact with virtually any computer application and webpage just like a human user would, by performing basic actions such as clicking, typing, and scrolling, greatly expanding the range of AI applications. This greatly expands the scope of AI applications.

alt text

 

Operator's functional highlights and application potential

In the demo, Operator demonstrated impressive autonomous capabilities, understanding user commands and accomplishing a variety of everyday and professional tasks, for example:

  • Life Service ReservationOperator can automatically complete restaurant reservations, online shopping, flight bookings, event ticket bookings, housekeeping appointments, takeout orders, and more. For example, users simply upload a photo of their handwritten shopping list, and Operator recognizes the content and completes the purchase on platforms such as Instacart.
  • Information Processing and Automation: Quickly complete repetitive operations such as batch downloading files, batch editing documents, filling out forms online, etc.

alt text

Specifically, Operator's feature highlights include:

  • visual perception: The CUA model is able to process pixel data from the screen, understand the current visual state of the screen, and recognize interface elements (e.g., buttons, text boxes, etc.).
  • Reasoning and planningWith the help of Chain of Thought (CoT) technology, CUAs are able to reason about the steps of a task, plan the path of operation, dynamically adjust the action plan according to changes in the environment, and even self-correct and adjust the strategy when encountering problems.
  • operation execution: CUA uses a virtual mouse and keyboard to perform clicking, scrolling, typing, and more until the target task is completed. Users can even have Operator make restaurant reservations using specific apps, such as OpenTable, or upload a shopping list as an attachment to Instacart to place an order.

alt text

alt text

 

CUA Technology Core: Visual Perception, Inference Planning and Common Interface

Operator's core driving force lies in the strong technical capabilities of the CUA model, whose core technical components mainly include the following three aspects:

(1) Visual perception and reasoning: CUA analyzes the content of the interface by processing screenshots to understand the elements and information on the screen. Combined with "thought chain" technology, CUA is able to infer next steps and generate screenshots and action logs for tracking and adjusting task flow.

(2) Multi-step task planning: CUA is able to break down complex tasks into multi-step operations, such as searching for products in a web page, selecting specifications, confirming orders, etc. CUA is also able to provide the ability to customize and customize a web page to suit your needs. More importantly, CUA has the ability to Adaptation to change and self-correction The ability to try to find alternatives when the content of the site is not what is expected.

(3) Generic interfaces that do not require specific APIs: CUA gets rid of the traditional AI's dependence on APIs and can interact directly with the user interface, which makes it adaptable to almost all web pages and software environments, and truly realizes the "A Universal Interface for the Digital World."that allows AI to interact with all software tools used by humans.

 

CUA Performance: Benchmarks and Real-World Applications

CUA has made breakthroughs in a number of benchmark tests, far exceeding the previous state of the art:

  • OSWorld (operating system tasks): CUA completion rate of 38.1%This is significantly higher than the previous best record 22.0%The
  • WebArena (Browser Tasks): CUA has a success rate of 58.1%, much higher than the previous 36.2%The
  • WebVoyager (Simple Web Tasks): CUA reached 87% success rate, which is close to the human level.

alt text

alt text

Nonetheless, CUA still falls short of the human level (e.g., OSWorld has a human completion rate of 72.41 TP3T). In practice, CUA also has some limitations:

  • Inaccurate text editing: Error-prone in complex text editing tasks.
  • Interaction limitations: When faced with an unknown and complex user interface, multiple trial and errors may be required.
  • Dependency Detail Description: Very specific operating instructions are required from the user to get the best results.

 

Safety and security: multiple mechanisms to protect user privacy and security

Considering that Operator may handle sensitive operations such as payments and logins, OpenAI has incorporated multiple layers of security in its design to ensure user privacy and operational security:

  • Mandate confirmation: The system proactively requests confirmation from the user before performing critical operations such as reservations and payments. For example, when an assistant drafts an email to reset a password or is about to delete an email, the user is asked to confirm whether to proceed or not.
  • Content Filtering: The system automatically recognizes and blocks potentially harmful requests (e.g., weapons purchases).
  • behavioral monitoring: The system has a built-in monitoring function that detects abnormal operations and suspends tasks.
  • Users can take over control at any timeThe user can take over the task at any time during the operation, and Operator does not have access to the user's operation records during the takeover period, which protects the user's privacy.
  • Human oversight mechanisms: For sensitive tasks (e.g., entering a password), the CUA requests confirmation from the user to prevent misuse.
  • Anti-fraud measures: CUA is able to recognize potentially fraudulent websites and suspend operations.
  • Behavioral transparency: CUA generates screenshots at every step of the operation to ensure that all actions are traceable.

alt text

 

Future Outlook: Smart Body Popularization and AGI Development

Currently, Operator is only open for testing to Pro users in the U.S. OpenAI says it will expand to more user groups in the future and plans to open up CUA capabilities through an API that will allow developers to build their own computer intelligences.

The launch of Operator is considered an important step in the evolution of AGI. Going forward, Operator and CUA technology will continue to evolve in a number of ways:

  • Expansion of Intelligentsia: CUA's action space will be extended to more tasking scenarios, and OpenAI plans to provide open APIs to support developers in building custom intelligences and expanding their application boundaries.
  • Operator Global OpenIn the future, Operator expects to open access to Plus users in more regions, benefiting users around the world.
  • Advancing AGI: The emergence of Operator heralds the accelerated arrival of the age of intelligences, with more similar intelligences expected to emerge in the coming years, and AI replacing humans in a wider range of digital interaction tasks. 2025 may become the true "Year of the Smart Body".The

 

Conclusions and reflections

The release of Operator and CUA marks a revolutionary shift in the interaction mode of AI. The interaction between AI and computers is shifting from a data-interface-based mode to a human-computer interface-based universal operation mode, which lays a solid foundation for the realization of general artificial intelligence (AGI).

Think deeply about the problem:

  • Will CUA technology gradually replace existing API-based AI operations? What are the actual deployment costs and benefits in the industrial sector?
  • As CUA capabilities continue to increase, how will the role of the human user in digital tasks shift? Do we need to prepare for the "intelligent body takeover?"
  • In the face of increasingly complex network environments and potential risks of misuse, how can CUAs continue to effectively ensure user security? What new dimensions should be considered for future security design?
May not be reproduced without permission:Chief AI Sharing Circle " OpenAI Launches Operator, the First L3-Level Intelligence: Opening a New Era of Human-Computer Interaction

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish