AI Personal Learning
and practical guidance
Beanbag Marscode1

Analyzing the Product Format of AI Performing Desktop Operational Tasks Using AutoGLM as a Citation

Today Wisdom Spectrum released "AutoGLM Contemplation", which many people say is Manus I think it's true that Manus can be categorized as a Deep Research product, or a Deep Research product. But such a simple categorization will create a lot of cognitive errors for both developers and users, and I think many people have the same problem, at least I do.

Consider that Smart Spectrum has released desktop automation class applications so far ( AutoGLM-Web Plugin ), until the "AutoGLM Meditations", you can basically see a near-complete lineup of the entire Smart Spectrum product line.


So today's conversation is centered around "AutoGLM Meditations", which deconstructs the branching capabilities of AI products that perform desktop operational tasks.

Talking about AI automation application ecology with AutoGLM as a lead-1

 

The official Wisdom Spectrum presentation is pragmatic

AutoGLM Contemplation is an Autonomous Intelligent Body (AI Agent) that can explore open-ended questions and perform actions based on the results. It is capable of simulating human thought processes, from data retrieval and analysis to report generation.

 

For the user, what "AutoGLM contemplation" really is is the developer's word, and the developer can help the user focus on a feature point and guide the user through it, but ultimately there is no way to self-define it on the user's behalf.

For developers, the discussion of "AutoGLM Contemplation" is Manus, Deep Research,Wisdom Spectrum Cow, AI Search,Browser-Use, neither is correct, one has to break down his functions and discuss the boundaries of his capabilities to have a discussion. If one simply summarizes AutoGLM contemplation as Manus There are obvious bugs, such as Manus being able to do computational tasks and "AutoGLM Meditation" not.

 

Start by understanding the basic features of AutoGLM Contemplation.

used up Qingyin Browser Plug-in For those of you who have found them to be very similar, they are now united under the "AutoGLM" product line, and it is recommended that you start with the plug-in before using the "AutoGLM Contemplation" client. There is no feature parity between the two, and the plugin is (currently) more powerful than the client.

However, the client can currently access sites that are "out of the whitelist", whereas the plugin currently limits the scope of information:

Talking about AI automation application ecology with AutoGLM as a lead-1

Therefore, the potential of the AutoGLM Contemplation feature can be better utilized by using the client to understand it.

 

1. Download the client, must also install the plug-in

Download: https://autoglm-research.zhipuai.cn/#get_started

Talking about AI automation application ecology with AutoGLM as a lead-1

 

2. Initiate the first task (operate together and observe the process)

Find all free "AI Translator" tools from https://www.aisharenet.com/, and only collect AI Translator tools with clients.

Talking about AI automation application ecology with AutoGLM as a lead-1
Tip: This is not a good task description, because the website does not provide an in-site search function and a clear entry point to AI translation tools, a better task description would be: start flipping pages from https://www.aisharenet.com/tag/aifanyi/ and find all free and client-side AI translation tools from the list information.
3. Observe the process of task execution (this is a screenshot of part of the page automatically visited during the execution of the tool)
reflections Talking about AI automation application ecology with AutoGLM as a lead-1

First, find the search box, type in "AI Translation" and execute the search. Talking about AI automation application ecology with AutoGLM as a lead-1

Go to the Bing search interface (the site's search box is a jump to Bing search) and start visiting the link... Talking about AI automation application ecology with AutoGLM as a lead-1
When visiting the second link, a categorized directory of AI translation tools was found Talking about AI automation application ecology with AutoGLM as a lead-1
Link-by-link browsing in a categorized list of AI translation tools with automatic page turning Talking about AI automation application ecology with AutoGLM as a lead-1
Visit the second page and start the summarization task Talking about AI automation application ecology with AutoGLM as a lead-1
Output full research report Talking about AI automation application ecology with AutoGLM as a lead-1

4. not covered by the important test link "login" interested parties to launch their own task to observe the interaction process, the task is able to evoke the login interaction. (Log out of Xiaohongshu first)

Gathering the knowledge of Little Red Book about DEEPSEEK generating videos

 

localization

Knowledge Depth Research Tool, from the results obtained it was possible to reverse analyze that the tool prompts were designed around writing a research paper and were not suitable for other types of tasks.

 

Core competencies

  • Generate a plan of tasks to be performed
  • Wake up the browser
  • In-browser viewing (text only), clicking, typing
  • Task judgment nodes (partial): web browsing completed, observe the page and determine the next task, determine whether login is required, end of information acquisition

Automation around browser visual interactions, but only for collecting information and writing research reports, it does not look like it is releasing all of its capabilities at this point, especially with client-side additions, and it should be able to integrate more capabilities in the future.

 

One sentence summary of AutoGLM contemplation vs. Wisdom Spectrum Bull Difference

The former operates the browser visually, automating the process of gathering information and generating "input" only for searching and visiting pages.

The latter operates the desktop visually and is not limited to the automation of the information gathering process, but is free to manipulate the desktop to accomplish tasks.

 

One Sentence Summary AutoGLM Contemplation vs. Clearspeak Browser Plugin Differences

The former operates the browser visually, and as a PC client can later interact with more interfaces.

The latter still has the same visual manipulation of browser capabilities, and as a browser plug-in can natively interact with the information on the visited page.

 

Back to AI performing desktop manipulation tasks

Let's start with a question:

AutoGLM Contemplative Core Competencies Browser-Use Both, writing in-depth research reports STORM More powerful, why use AutoGLM Contemplation?

The answer is summarized below:

AutoGLM Contemplation is a consumer-facing productization tool designed with a complete process of information gathering and writing research reports.

There is no need to configure complex local installation environments and utilize cloud computing power to collaborate on local interactions.

STORM is a fixed source of information collection and does not have access to non-open information, whereas AutoGLM contemplates the use of browser automation to achieveNon-open information collectionThe

 

By this time you will vaguely recognize some differences between the tools? In fact, the problem is very simple, the following from summarizing the desktop character automation tools to start combing.

 

Two types of solutions for desktop task automation

1. Traditional set fixed anchor points and execute by process. Example: Microsoft PA, Shadowblade.

2. Purely visual interactions that utilize Browser-Use to assist the larger model in determining and generating interactions. Example: AutoGLM contemplation.

3. Hybrid: Shadowblade can also be based on a fixed workflow, with some nodes (especially content extraction sessions) using purely visual interactions. More typical is Microsoft's automated customer service orchestration tool, after the introduction of AI, so that customer service in the fixed SOP premise, work more humanized.

 

Moving on to focusing on purely visual interaction solutions, let's come up with a name... Desktop Task Automation Intelligence

 

What can desktop task automation intelligences be capable of?

General competence:

Desktop Visual Recognition, Desktop Functional Operation

 

Scalability:

Single Intelligence, Multi-Intelligence Performing Tasks.Multiple intelligences are generally used to perform task planning, branching tasks, task coordination and information summarization, respectively.

Execute desktop operations by referring to a fixed "tool" or fixed "workflow" for a specific task.For example: calculations, programming, searching for quality sources of information. the reason why Manus is so powerful is that it integrates programming tools to accomplish some of the branching tasks.

Extend (access) local, remote data sources.

 

Limitations:

Desktop task automation intelligences do not necessarily need to operate the desktop purely visually. If my branch task includes searching for "Knowledge", it may be better to directly interface with the search results of Knowledge, and desktop operation will be inefficient instead. Therefore, a reasonable extension capability can help to realize the value of desktop intelligences.

 

What Desktop Task Automation Intelligence is good for

AutoGLM contemplation is limited to searching for non-open knowledge, which is great for knowledge search scenarios, but the point where it can be of greater value is in automating operations where the interface contains dynamic information and is repetitive. This Convergence Doing a good job of automating the task execution by the AI and then saving the task execution process so that it can be looped subsequently.

Summarize: check information, perform duplicate work.

 

Desktop Operating Tasks Product Capability Portfolio

The above teardown has enough information to summarize the current form of similar products.

In the end it is nothing but a combination of the following capabilities, local or cloud, designing the range of processed and unprocessed task execution, and ultimately presenting the user with the type of executable task.

All similar tools that can be thought of can be summarized in the following chart.

Using AutoGLM as a guide, talk about the product form of AI performing desktop operation tasks-1

May not be reproduced without permission:Chief AI Sharing Circle " Analyzing the Product Format of AI Performing Desktop Operational Tasks Using AutoGLM as a Citation
en_USEnglish