AutoGLM-Web Plugin: More Than Computer Use, AI's 'Phone Use' Moment Has Arrived!

65.3K 00

AutoGLM-Web插件：不止 Computer Use，AI 的「Phone Use」时刻也来了

Compared to computers, cell phones, which 'can't be put down', accompany us for a longer period of time and are closer to our lives.

If "Computer Use" opens up a new paradigm of human-computer interaction, then "Phone Use" takes it a step further, unlocking more application possibilities and allowing AI to truly benefit everyone.

GLM-PC (Smart Spectrum Bull) officially released for internal download, the real AI that can control the computer

Today, based on the efforts and research results of the GLM technical team in language modeling, multimodal modeling and tool usage, we are launching GLM's first productized intelligent body (Agent) - AutoGLM, which can simulate human operation of a cell phone by just receiving simple text/voice commands, and in turn help you:

'Like and write comments on your boss's friend circle' on WeChat ......
On Taobao "buy a certain historical order product" ......
Book hotels on Ctrip ......
Buy train tickets on 12306 ......
Order a takeaway on Meituan ......

Theoretically, with a deep understanding of GUI, AutoGLM can do anything a human can do on a visual electronic device (computer, phone, tablet ......).

AI's 'phone use' moment has taken us another small step forward on the road to generalized artificial intelligence (AGI).

It is not limited to simple task scenarios or API calls, and does not require users to manually build complex and tedious workflows, the operation logic is similar to humans, and truly assists humans in daily life and work.
Project address: https://xiao9905.github.io/AutoGLM

This time, we still don't post 'futures', so you can pass:
Chrome or Edge to experience AutoGLM-Web by installing the "Wisdom Spectrum" plug-in, a browser assistant that simulates a user's visit to a web page and clicks on it, with a large model that automates advanced searching, summarization, and content generation on a website based on user commands.
On the cell phone side, the first batch is open to some Qingyin users (only Android system is supported for the time being), and you are welcome to submit applications for internal testing. It is worth mentioning that we also have deep cooperation with Honor and other cell phone manufacturers based on AutoGLM.

AutoGLM Technology

AutoGLM is based on Smart Spectrum's self-developed "Decoupled Intermediate Interface for Basic Intelligents" and "Self-evolving Online Course Reinforcement Learning Framework", which overcomes intelligent research and application challenges such as capacity antagonism, scarcity of training tasks and data, scarcity of feedback signals, and drifting of strategy distribution in the task planning and action execution of large-modeled intelligences, coupled with the adaptive learning strategy, which is capable of continuous improvement in the iterative process, continuously and stably improve its performance. Just like a person, in the process of growth, constantly acquiring new skills.

AutoGLM addresses two key challenges when large models are used as intelligences:

Challenge 1: Insufficiently precise "action execution

A major challenge in training large model intelligences lies in how to make the model learn to precisely manipulate the elements displayed on the screen. End-to-end training to jointly train 'action execution' and 'task planning' capabilities is constrained by the high cost of trajectory data acquisition and the severe shortage of total data, resulting in inadequate training of action execution capabilities that require high precision.
In order to solve this problem, AutoGLM introduces the design of "decoupled middle interface of basic intelligence", decoupling the two phases of "task planning" and "action execution" through a natural language middle interface, which realizes a great enhancement in the ability of intelligence. For example, when ordering a takeaway on a cell phone and clicking the "submit order" button, the comparison between the traditional and the "intermediate interface" scheme is as follows:

Challenge 2: Lack of flexibility in "mission planning"

Another major challenge is that GUI intelligences have extremely limited and costly training trajectory data. Moreover, intelligences need to have the flexibility to plan and correct on-the-fly when faced with complex tasks and real-world environments. This cannot be easily obtained by traditional large model training methods such as Imitation Learning and Supervised Fine-Tuning (SFT). To this end, we have developed an "Autoevolutionary Online Course Reinforcement Learning Framework" to learn and improve the capabilities of large model intelligences from scratch in real online environments, both Web and Phone, using Web browsers as the experimental environments. By introducing a self-evolutionary learning strategy, the model continuously examines, spurs, and improves itself. Through the course reinforcement learning method, the framework dynamically adjusts the learning task difficulty according to the intelligence's ability level in the current iteration rounds to maximize the utilization of the model's potential. And through the policy update of KL dispersion control and the intelligent body confidence experience playback, we mitigate and avoid the problem of model forgetting the previously learned tasks during iterative training. The open-source version of GLM-4-9B trained based on this method can then improve more than 160% relative to GPT-4o in the WebArena-Lite evaluation benchmark, achieving an overall task success rate of 43%.
AutoGLM achieves significant performance improvements in both Phone Use and Web Browser Use through the combined application of Wiseparation's own strategy of "decoupling middle interfaces with basic intelligences" and "self-evolving online course reinforcement learning framework". For example, AutoGLM significantly outperforms GPT-4o and Claude-3.5-Sonnet on AndroidLab benchmarks. AutoGLM-Web插件：不止 Computer Use，AI 的「Phone Use」时刻也来了 In the WebArena-Lite benchmark, AutoGLM achieves a performance improvement of about 200% over GPT-4o, which greatly narrows the gap between the success rate of human and large model intelligences in GUI manipulation.
AutoGLM now supports automated task execution on multiple applications on a real Android phone by way of an Android application. AutoGLM performs satisfactorily in manual evaluation of simple tasks.
AutoGLM-Web插件：不止 Computer Use，AI 的「Phone Use」时刻也来了

AI News

Article copyright AI Sharing Circle All, please do not reproduce without permission.

Nvidia 最新推出的 AI 聊天机器人能在你的个人电脑上独立运作，而且完全免费。

Nvidia's latest AI chatbot works independently on your PC and is completely free.

AI News

2yrs ago

043K

[Spin] Deepseek R1 may have found a way to outperform humans

AI News

1yrs ago

040.6K

The o1 is not a chat model and teaches you how to properly energize o1 capabilities

AI News

1yrs ago

045.6K

Smart Spectrum open platform, the first free multimodal vision model GLM-4V-Flash on line, unlimited use!

AI News # Free Large Model API

1yrs ago

072.2K

No comments

You must be logged in to leave a comment!

No comments...

AutoGLM-Web Plugin: More Than Computer Use, AI's 'Phone Use' Moment Has Arrived!

AutoGLM Technology

Mochi 1 Video Generation Model: SOTA in Open Source Video Generation Modeling

Upgraded Claude 3.5 Sonnet Chinese benchmark evaluation is out! Code ability exceeds GPT-4o, higher-order reasoning is not as good as o1

Related posts

Nvidia's latest AI chatbot works independently on your PC and is completely free.

[Spin] Deepseek R1 may have found a way to outperform humans

The o1 is not a chat model and teaches you how to properly energize o1 capabilities

Smart Spectrum open platform, the first free multimodal vision model GLM-4V-Flash on line, unlimited use!

No comments

Latest Collections

Latest Articles

AutoGLM-Web Plugin: More Than Computer Use, AI's 'Phone Use' Moment Has Arrived!

AutoGLM Technology

Mochi 1 Video Generation Model: SOTA in Open Source Video Generation Modeling

Upgraded Claude 3.5 Sonnet Chinese benchmark evaluation is out! Code ability exceeds GPT-4o, higher-order reasoning is not as good as o1

Related posts

Nvidia's latest AI chatbot works independently on your PC and is completely free.

[Spin] Deepseek R1 may have found a way to outperform humans

The o1 is not a chat model and teaches you how to properly energize o1 capabilities

Smart Spectrum open platform, the first free multimodal vision model GLM-4V-Flash on line, unlimited use!

No comments

Selected AI Tools

Latest Collections

Latest Articles