AI Personal Learning
and practical guidance
Resource Recommendation 1

AutoGLM-Web Plugin: More Than Computer Use, AI's 'Phone Use' Moment Has Arrived!

 


Compared to computers, cell phones, which 'can't be put down', accompany us for a longer period of time and are closer to our lives.

If "Computer Use" opens up a new paradigm of human-computer interaction, then "Phone Use" takes it a step further, unlocking more application possibilities and allowing AI to truly benefit everyone.

GLM-PC (Smart Spectrum Bull) officially released for internal download, the real AI that can control the computer

 

Today, based on the efforts and research results of the GLM technical team in language modeling, multimodal modeling and tool usage, we are launching GLM's first productized intelligent body (Agent) - AutoGLM, which can simulate human operation of a cell phone by just receiving simple text/voice commands, and in turn help you:

'Like and write comments on your boss's friend circle' on WeChat ......
On Taobao "buy a certain historical order product" ......
Book hotels on Ctrip ......
Buy train tickets on 12306 ......
Order a takeaway on Meituan ......

Theoretically, with a deep understanding of GUI, AutoGLM can do anything a human can do on a visual electronic device (computer, phone, tablet ......).

AI's 'phone use' moment has taken us another small step forward on the road to generalized artificial intelligence (AGI).

 

It is not limited to simple task scenarios or API calls, and does not require users to manually build complex and tedious workflows, the operation logic is similar to humans, and truly assists humans in daily life and work.
Project address: https://xiao9905.github.io/AutoGLM

This time, we still don't post 'futures', so you can pass:
Chrome or Edge to experience AutoGLM-Web by installing the "Wisdom Spectrum" plug-in, a browser assistant that simulates a user's visit to a web page and clicks on it, with a large model that automates advanced searching, summarization, and content generation on a website based on user commands.
On the cell phone side, the first batch is open to some Qingyin users (only Android system is supported for the time being), and you are welcome to submit applications for internal testing. It is worth mentioning that we also have deep cooperation with Honor and other cell phone manufacturers based on AutoGLM.

 

AutoGLM Technology

AutoGLM is based on Smart Spectrum's self-developed "Decoupled Intermediate Interface for Basic Intelligents" and "Self-evolving Online Course Reinforcement Learning Framework", which overcomes intelligent research and application challenges such as capacity antagonism, scarcity of training tasks and data, scarcity of feedback signals, and drifting of strategy distribution in the task planning and action execution of large-modeled intelligences, coupled with the adaptive learning strategy, which is capable of continuous improvement in the iterative process, continuously and stably improve its performance. Just like a person, in the process of growth, constantly acquiring new skills.

AutoGLM addresses two key challenges when large models are used as intelligences:

Challenge 1: Insufficiently precise "action execution

A major challenge in training large model intelligences lies in how to make the model learn to precisely manipulate the elements displayed on the screen. End-to-end training to jointly train 'action execution' and 'task planning' capabilities is constrained by the high cost of trajectory data acquisition and the severe shortage of total data, resulting in inadequate training of action execution capabilities that require high precision.
In order to solve this problem, AutoGLM introduces the design of "decoupled middle interface of basic intelligence", decoupling the two phases of "task planning" and "action execution" through a natural language middle interface, which realizes a great enhancement in the ability of intelligence. For example, when ordering a takeaway on a cell phone and clicking the "submit order" button, the comparison between the traditional and the "intermediate interface" scheme is as follows:

Challenge 2: Lack of flexibility in "mission planning"

Another major challenge is that GUI intelligences have extremely limited and costly training trajectory data. Moreover, intelligences need to have the flexibility to plan and correct on-the-fly when faced with complex tasks and real-world environments. This cannot be easily obtained by traditional large model training methods such as Imitation Learning and Supervised Fine-Tuning (SFT). To this end, we have developed an "Autoevolutionary Online Course Reinforcement Learning Framework" to learn and improve the capabilities of large model intelligences from scratch in real online environments, both Web and Phone, using Web browsers as the experimental environments. By introducing a self-evolutionary learning strategy, the model continuously examines, spurs, and improves itself. Through the course reinforcement learning method, the framework dynamically adjusts the learning task difficulty according to the intelligence's ability level in the current iteration rounds to maximize the utilization of the model's potential. And through the policy update of KL dispersion control and the intelligent body confidence experience playback, we mitigate and avoid the problem of model forgetting the previously learned tasks during iterative training. The open-source version of GLM-4-9B trained based on this method can then improve more than 160% relative to GPT-4o in the WebArena-Lite evaluation benchmark, achieving an overall task success rate of 43%.
AutoGLM achieves significant performance improvements in both Phone Use and Web Browser Use through the combined application of Wiseparation's own strategy of "decoupling middle interfaces with basic intelligences" and "self-evolving online course reinforcement learning framework". For example, AutoGLM significantly outperforms GPT-4o and Claude-3.5-Sonnet on AndroidLab benchmarks. In the WebArena-Lite benchmark, AutoGLM achieves a performance improvement of about 200% over GPT-4o, which greatly narrows the gap between the success rate of human and large model intelligences in GUI manipulation.
AutoGLM now supports automated task execution on multiple applications on a real Android phone by way of an Android application. AutoGLM performs satisfactorily in manual evaluation of simple tasks.

Contents3
May not be reproduced without permission:Chief AI Sharing Circle " AutoGLM-Web Plugin: More Than Computer Use, AI's 'Phone Use' Moment Has Arrived!

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish