Smart Spectrum releases AutoGLM, an autonomous task execution Agent: this time, letting the Agent actively operate various types of devices to perform tasks

AI News1yrs agorelease AI Sharing Circle

60.5K 00

智谱发布自主执行任务Agent——AutoGLM：这次让Agent主动操作各类设备执行任务
There is a paradigm shift in the way humans interact with machines. This is due to the evolution of the conversational Chatbot into an autonomous Agent with hands, brain and eyes.

As one of the first big model companies to explore Agent, Smart Spectrum brings several new developments to the table today:

AutoGLM can autonomously perform long step-by-step operations with more than 50 steps, and can also perform tasks across apps.
AutoGLM opens new 'fully automated' Internet experience, supports dozens of sites and more unmanned
GLM-PC for operating computers like humans launches internal testing and explores techniques for realizing general-purpose agents based on visual multimodal models

At Agent OpenDay, AutoGLM sent "a WeChat red envelope from AI" to hundreds of guests, and remotely commanded computers to send files automatically.

智谱发布自主执行任务Agent——AutoGLM：这次让Agent主动操作各类设备执行任务
The CEO of Wisdom Spectrum, Mr. Zhang Peng, needs to do is just to give a simple voice command on the spot. These were originally very complex operations for the machine, today, completely by the Smart Spectrum productized Agent to complete.

AutoGLM's New Upgrade: The Challenge Gets More Complex

The newly upgraded AutoGLM can be challenged to accomplish complex tasks:
Longer: Understand extra-long instructions and perform extra-long tasks. For example, in the example of purchasing hot pot ingredients, AutoGLM autonomously performs 54 steps without interruption. Moreover, AutoGLM outperforms human manual operation in this long multi-step, cyclic task.
Cross-app: AutoGLM supports cross-app execution of tasks. Users will be accustomed to AI processing automatically instead of switching back and forth between multiple apps. Since the current AutoGLM form is more like a scheduling layer for APP execution between users and apps, the cross-app capability is a very critical step in it.
Short phrases: AutoGLM is able to support custom phrases for long tasks. Today, instead of giving AutoGLM long commands like "Buy me a coffee, raw coconut latte, Wudaokou store, large, hot, mildly sweetened", you can just say "Order Coffee".
Casual Mode: We're all afraid of making choices, and today AutoGLM can actively help you make decisions. In Casual Mode, the AI decides all the steps, bringing you a surprise in the form of a blind box. Would you like to try the coffee flavor that AI orders for you?

At the same time, AutoGLM has started large-scale internal testing and will be launched as soon as possible as a product for C-support users. AutoGLM also announced the launch of the "1 Billion APPs Free Auto Upgrade" program, inviting App partners to jointly explore their own new Auto scenarios.

The AutoGLM specimen APIs that support the core scenarios and core applications will be uploaded to the Wisdom Spectrum maas open platform (bigmodel.cn) for trial use within two weeks.

Web terminal opens a new experience of "fully automatic" surfing on the Internet: from now on, Wisdom Spectrum's AutoGLM plug-in goes online and supports the unmanned driving of dozens of websites such as Baidu search, Weibo, Zhihu, Github and so on. In the on-site demo, AutoGLM plug-in automatically completed the process of "searching for Mango tv in Baidu, opening Little Alley House, playing the latest episode, and sending pop-ups to punch in the ending". The whole process without human intervention.

智谱发布自主执行任务Agent——AutoGLM：这次让Agent主动操作各类设备执行任务

GLM-PC Invitation to Test: A Technology Exploration for "Driverless" Computers

Not only based on cell phones and browsers, today Wisdom Spectrum also brings PC-based Autonomous Agents. GLM-PC is a technology exploration for "driverless" PCs by the GLM team, based on Wisdom Spectrum's multimodal model, CogAgent. the first phase of internal testing scenarios is currently open, including:

Meeting stand-ins: help users book and participate in meetings, send meeting summaries.
Document processing: support document download, document sending, understanding and summarizing documents.
Web search and summarization: Search for specified keywords on specified platforms (e.g. WeChat, Zhihu, Xiaohongshu, etc.) to complete reading and summarizing.
Remote and timed operation: Remote cell phone sends commands, GLM-PC can autonomously complete the computer operation; set a future time to execute tasks regularly in the boot state.
Invisible screen: While the user is working, the GLM-PC can autonomously complete its work on the invisible screen, freeing up the use of the screen.

The GLM-PC uses a computer in almost exactly the same way as a human would - by looking at graphics and text with the eyes, planning with the brain, and then using the hands to perform operations such as clicking, double-clicking, typing, etc. That's why the GLM-PC is able to perform almost any application designed for humans after learning it. Because of this, GLM-PC is theoretically capable of executing any application designed for humans after it has learned it. This is a system-level, cross-platform capability that does not depend on HTML or APIs, and has a higher capability ceiling.

However, due to the complexity of the PC, and the fact that almost everything everyone accomplishes on the PC is a complex task, frankly, the capabilities of today's big models are still some way off from being a real replacement for everyone's office. the GLM-PC, in its current version, still requires the user to enter very precise commands.

GLM-PC "Invitation to Experience" has been opened. We will continue to work hard to improve the product and make it available to all users as soon as possible, and we also hope to explore with more manufacturers.

AutoGLM and GLM-PC are our important attempts to move towards an AI intelligent operating system. They emerged from Wiseplan's accumulation of technology in large language models, multimodal models, logical reasoning, and tool usage. Starting from AgentBench in April '23, to the CogAgent model in August, Wiseparation's research and development for AutoGLM, and GLM-PC's model, CogAgent, has been carried out for a year and a half.

Unlike OpenAI, Smart Spectrum defines five stages in the development of a large model: L1 Linguistic ability, L2 Logical ability (multimodal ability), L3 Ability to use tools, L4 Self-learning ability, and L5 Exploration of scientific laws.

Development to date has primed the Big Model with some of the capabilities of human interaction with the real physical world. "Agent will greatly enhance L3's ability to use tools, while opening up the exploration of L4's ability to self-learn." Zhang Peng said.

Zhang Peng said that in the future, the GLM team will continue to accelerate the development of agent model products, looking forward to the paradigm of operating computers and cell phones in one sentence as soon as possible.

Big Models from Chat to Act

Today, big model technology is changing the way machines and people interact. Based on understanding needs, planning and decision-making, executing actions, and self-reflection, Agent will lead to intuitive human-machine interactions - from people adapting to machines, to making machines adapt to people.

Companies such as Apple Intelligence, Anthropic (Computer Use), Google (Jarvis), and OpenAI (Operator) have also identified agentic AI as a major focus for 2025. It is widely believed that 2025 will be the year of the agent explosion. Gartner recently listed agentic AI as one of the top 10 technology trends in 2025, and predicted that at least 15% daily work decisions will be made autonomously by agentic AI in 2028, compared to zero in 2024.

Unlike GenAI, Agents are goal-driven, capable of fully executing workflows, adapting, learning, iterating, collaborating with other systems and humans, and accomplishing tasks end-to-end. In Zhang Peng's view, Agent can be seen as the prototype of LLM-OS, a large model general purpose operating system.

"At this stage, AutoGLM is equivalent to adding an execution scheduling layer between humans and applications, largely changing the form of human-machine interaction. More importantly, we see the possibility of LLM-OS, based on large model intelligence capabilities (from L1 to L4 and beyond), which has the opportunity to enable native human-computer interaction in the future. Taking the HCI paradigm to the next level."

A New Paradigm for Smart Devices in the Age of AI

As the big model capabilities continue to evolve, we are slowly seeing AI grow its own brain, eyes and hands. Not only is the intelligence continuing to grow, but the perceptual capabilities and interaction bandwidth are also being enriched and expanded, as well as the accelerated execution brought about by the Agent now.

Zhang Fan, COO of Smart Spectrum, said that smart devices will be revitalized with new opportunities with the support of big models. Cell phone + AI will become a personal intelligent assistant, PC + AI will become a new productivity tool, and car + AI will make the car an intelligent third living space. Of course, the big model will not only bring opportunities for cell phones, PCs and cars, but will also benefit all kinds of smart devices. The continuous evolution of the Big Model has laid a strong foundation for Agent to transform the human-vehicle interaction experience.

With the continuous improvement of end-side performance and computing power, models adapted for AI-native devices, and the emergence of a collaborative architecture with end-cloud homology, Agent not only revolutionizes the user experience on operating system OS and applications, but also extends it to all kinds of smart devices, from cell phones to computers, to automobiles, eyeglasses, homes, and all kinds of edge-side devices, all kinds of AI-native devices are scrambling to emerge.

Wang Zuo-jian, Director of Honor AI Technology, Zhong Huai-sheng, Head of Intelligent Ecology of ASUS AIPC, Lian Lei, Head of Intelligent Voice/Intelligent Business of Xiaopeng Automobile's Cockpit, Wan Satellite, Head of Qualcomm's AI Product Technology in China, and Gao Yu, General Manager of Intel China's Technology Department, as customers and partners of Smart Spectrum, shared their practice and outlook of smart terminals from different scenarios respectively.

The development of Big Model and Agent not only brings users a new paradigm of smart devices in the AI era, but also means a broader landing space for Big Model technology. From smart devices to smart networks, in the near future, we will see the interconnectivity and infinite possibilities of AI-native devices. In this process, Smart Spectrum will also provide a series of products and capabilities to help smart devices embrace big models and accelerate towards a new era of AI-native devices.