Agent (intelligent body) technology is sweeping through the tech world at an unprecedented rate this week, and behind this boom is a leap forward in reasoning modeling capabilities.
On the evening of March 5th, Manus made a stunning debut with a powerful demo that instantly set the internet ablaze. Only two days later, the domestic team DeepWisdom MetaGPT and CAMEL AI have each launched open source projects OpenManus and OWL, quickly replicating the Manus 's core features have once again ignited the web and GitHub communities, sparking a wide and deep discussion.
Especially noteworthy is that the OpenManus team, with MetaGPT's long-accumulated technical background, completed the construction of the core system in just one hour and put the project online in just three hours. This amazing speed not only helped OpenManus gain over 10,000 stars on GitHub, but also made it the center of attention both inside and outside the industry.
On the morning of March 8, JQF invited three core members of the OpenManus team to give an in-depth sharing session, aiming to analyze the principles of OpenManus' technical implementation and discuss the future development trend of Agent technology.
The three guests are all senior experts in the field of Agent: Siren Hong, the first author of the MetaGPT paper (ICLR 2024 Oral) and the Data Interpreter paper, and one of the authors of the AFLOW paper (ICLR 2025 Oral), whose research results have been published many times in the top international academic conferences and journals, such as TPAMI and ICLR, His research results have been published in TPAMI, ICLR and other top international conferences and journals. Liang Xinbing, the core developer of OpenManus. Jinyu Xiang is a co-author of OpenManus and the first author of AFlow and SPO.
In their sharing, the three guests put forward the following forward-looking thoughts on the future direction of Agent technology, as well as the challenges facing the industry:
- As the capabilities of Large Language Models (LLMs) continue to grow, the success rate of Agent applications will increase significantly in many domains, especially in relatively standardized tasks such as QA quizzes, HumanEval code proficiency evaluations, and MBPP Python programming problems, where a single model has demonstrated superior solving capabilities.
- However, there are a large number of real-world problems that are complex and have long-tail effects, such as complex machine learning tasks, code bug fixes, and search combinatorial problems that require the integration of multiple pieces of information in order to provide effective answers to users. These problems still require significant technological innovation to improve agent performance, especially in solving model "illusion" problems.
- The progress of Agent in task planning ability depends on both the improvement of the model's own ability and the assistance of external structure. A more sophisticated architectural design can help the Agent better understand and decompose complex tasks.
- With the increasing variety of tools available to the Agent, it will become a new technical challenge to enable the Agent to make accurate decisions from a large number of tools with similar functions when facing the same task, to choose the most appropriate tool, and to avoid wrong choices.
- The core problem of memory management for Agent is how to find a balance between cost and efficiency. Directly using complete memory information, although it can be handled by current models, leads to a significant increase in processing time and cost, which seriously affects the user experience rather than performance degradation.
- Currently, one effective approach to solving the memory management problem is to adopt a multi-intelligent body architecture or a tool-assisted strategy. For example, frameworks such as OpenManus usually use planning tools to pre-generate a task plan, decompose a complex task into multiple subtasks, with incomplete sharing of memory between each subtask, and summarize or compress the process after the task has been executed, thus reducing computational costs.
- Although we can clearly determine whether the Agent has completed the task correctly in the benchmark test, it is still a difficult problem to quantitatively evaluate the accuracy or quality of the Agent in completing the task in the actual application scenario.
- The key to commercializing an Agent is to maximize the tasks and user needs in real-world scenarios, including providing highly personalized functionality, which is the only way to attract users to continue using the Agent.
- A large number of app developers are actively exploring Token Consumption optimization schemes, such as caching mechanisms or memory compression techniques at the engineering level, to minimize the length of the context that needs to be passed for each API call and reduce costs.
- In the future, by integrating the capabilities of multiple small models, it is expected to achieve results comparable to or even surpassing those of large models, and to achieve significant advantages in inference speed, Token consumption and cost.
Below is a detailed explanation of the content of this sharing.
01 One Night GitHub Hit, OpenManus' Technical Fastlane
Liang Xinbing: "After the group meeting on March 6, just after 5:00 p.m., Xiang Jingyu suggested that with a few key steps, we might be able to replicate the effect of Manus."
Recalling the opportunity to start the OpenManus project, Liang Xinbing said, "When he first saw the demo video of Manus, he was impressed by the smooth interaction experience. When he first saw the demo video of Manus, he was impressed by the smooth interaction experience in the video, and intuitively judged that Manus should be a single-intelligence system. "How can a single intelligent body achieve such excellent results, and how does it do task planning and realization? This is very shocking to me."
In the ensuing conversation, the team began to explore the technical solution for Manus, a general-purpose AI smart body product with an impressive user experience. However, from a technical point of view, Manus is in fact a clever integration of many core fundamental technologies that have been agreed upon by the industry. Ultimately, the team hypothesized that Manus employs an external planning mechanism to coordinate the work of multiple intelligences.
After dinner, the development of OpenManus was officially launched and the whole process took about three hours. "At that time, we didn't anticipate that OpenManus would become so popular so quickly." Liang Xinbing admits.
Manus Multi-Intelligence Architecture Explained: The Delicate Synergy of Planning and Execution
The core of Manus is its multi-intelligence system architecture. It first utilizes the PlanningTool planning tool to decompose user requirements into tasks, generating a detailed plan with multiple linear subtasks. The system then executes each subtask sequentially and dynamically assigns it to the most appropriate Agent, which executes the subtask using the ReAct A cyclic (Reason and Act) model that continually invokes the tool to accomplish a task.
Planning capability and tool usage capability are the two pillars of Manus. Manus' innovation of bringing the PlanningTool planning tool to the Multi-Intelligence Framework was critical. As evidenced by the Claude-3.7 model's breakthrough in the SWEBench Code Competency Review, performance improvements are partly due to advances in the model itself, and partly due to more effective task planning, and the MetaGPT team's previous research in the Data Interpreter project has shown that planning is critical and effective for solving complex problems in the real world. The MetaGPT team's previous research in the Data Interpreter project has also shown that planning is critical and effective for solving complex problems in the real world. As a result, the integration of planning capabilities into multi-intelligence and even single-intelligence frameworks has become an important direction in the development of Agent technology.
The team hypothesized that Manus may have used Claude models, combined with its own models that were post-trained and heavily optimized at the engineering level, which significantly improves its ability to use the tool in different scenarios.
OpenManus Design Philosophy: Minimalism, Pluggability and Powerful Planning Capabilities
The design concept of OpenManus can be summarized by two key words: "minimalist" and "pluggable". According to Liang Xinbing, the initial design concept was to build an extremely simple Agent framework, through the flexible combination of pluggable Tools and Prompts, to realize the various functions of the Agent. Based on this idea, the team quickly developed a complete Agent mini-framework.
Prompt guidance and the use of Tools are key factors in determining the effectiveness of the ReAct Agent. In OpenManus, Prompt is responsible for controlling the overall behavioral logic of the Agent, while Tools defines the action space of the Agent. The two work together to completely define a ReAct Agent. On top of the ReAct Agent, the OpenManus team has implemented a lightweight ToolCall Agent based on Function Call technology, which allows tools to be selected and executed in a more structured way. OpenManus is built on the ToolCall Agent.
The "pluggable" design brings great flexibility and extensibility, allowing developers to combine Tools from different scenarios to quickly create new Agents. Developers can freely combine Tools from different scenarios to quickly create new Agents, and the definition of Tools is very easy, no need to write complex internal logic, just simply modify the Agent's action space (Tools), and the Tools themselves should have good combinability, and the goal of OpenManus is to make the abstraction layer more concise and clear. By providing a rich set of tools and supporting multiple Agents to be flexibly equipped with different combinations of tools, OpenManus is able to easily extend its capabilities in various application scenarios.
Planning capabilities are also critical. OpenManus builds on the planning strengths of Manus and realizes task decomposition through the PlanningTool to effectively address real-world complexities.
OpenManus workflow: dynamic tasking and collaborative execution
The workflow of OpenManus is clear and efficient. Upon receiving a user request, the system first uses the PlanningTool to generate a plan with linear subtasks and writes the plan to a markdown file. OpenManus then parses the plan and takes out each subtask in turn. As each subtask is executed, the system dynamically assigns the task to the Agent best suited to handle it, equipped with a different toolset for handling different types of tasks.
Dynamic allocation of agents is one of the highlights of OpenManus. This flexible allocation mechanism enables the system to select the most suitable agent to execute the task according to the specific needs and context of the task, thus improving the efficiency and quality of task processing. Currently, OpenManus uses regular expression matching to assign tasks to agents. If a task cannot be matched to a specific Agent, the default configured Agent will be used to execute the task.
In the future, the OpenManus team is also considering introducing a Large Language Model (LLM) to take care of task-to-agent assignment. However, using LLM for intent recognition and Agent assignment for every task execution will undoubtedly increase the computational cost and latency.
Future Directions for OpenManus: Continuous Optimization and Community Building
In order to further improve the performance and user experience of OpenManus, the team plans to work on the following priorities:
- Enhanced planning capabilities: PlanningTool is continuously optimized to handle more complex task decomposition and planning scenarios.
- Introduction of standardized reviews: Industry benchmarking sets such as GAIA/TAU-Bench/SWE-Bench are used to continually evaluate and optimize the performance of OpenManus.
- Extended Model Adaptation: Extended model support from Claude-3-5 to DeepSeek V2.5 and many more models to optimize low-cost application scenarios.
- Enables containerized deployment: Simplifies the installation and use of OpenManus, lowering the barrier to entry for users.
- Enriched Sample Library: More practical examples and in-depth analysis of success and failure cases are added to help users better understand and use OpenManus.
- Front-end and back-end development: Develop user-friendly web UI interfaces to enhance user interaction experience.
- RAG Module Integration: Integrate the Retrieval Augmentation Generation (RAG) module to provide the Agent with an external knowledge base to enhance its knowledge acquisition and reasoning capabilities.
Liang Xinbing said that Manus has done a very good job in product interaction, and there are a lot of things to learn from it. At present, the effect of OpenManus is still relatively limited, and the team has not yet carried out specialized effect tuning.
The initial goal of OpenManus is to achieve the same results as the original Manus. In the long run, the team hopes to rely on the large open source community to continuously optimize the Computer Core capabilities such as Computer Use, Browser Use, and Planning Use, as well as tool invocation capabilities, drive OpenManus to higher levels of intelligence emergence.
02 MetaGPT Team: Years of technical precipitation, three-hour replica Manus
Siren Hong: "In fact, our team has accumulated years of technical experience in the field of automation and intelligent body frameworks for AI scenarios."
The MetaGPT team has long been committed to the research and open source of Agent technology, and in the past two years, it has continued to open source the team's research results, and formed high-quality academic papers and technical reports, actively contributing to the community. These results include:
- MetaGPT: A pioneering multi-intelligent body metaprogramming framework that lays out the core idea of multi-intelligent body collaboration.
- Data Interpreter: A powerful data science agent that demonstrates the great potential of LLM in the field of data analytics.
- AFlow: An automated Agent workflow generation framework that enables automatic exploration and optimization of Agent combinations.
- FACT: Context rewriting technology, which effectively improves the accuracy of multi-fact retrieval.
- SELA: A Tree Search Enhanced LLM Agent for Automated Machine Learning that Significantly Improves AutoML Performance.
- Self-Supervised Prompt Optimization: A self-supervised prompt optimization method that improves the efficiency and effectiveness of prompt engineering.
- SPO (https://www.modelscope.cn/studios/AI-ModelScope/SPO): open source cue word optimization tool for scenarios with few samples or no explicit scoring.
- Atom of Thoughts for Markov LLM Test-Time Scaling: an Atomic Thinking Approach to Enhance LLM Reasoning in Markov Decision Processes.
The MetaGPT framework: a cornerstone for multi-intelligence collaboration
Open-sourced in 2023, the MetaGPT framework was a pioneer in the field of multi-intelligence metaprogramming, and the MetaGPT team believed that while large-scale models of the time had demonstrated robustness on general-purpose tasks, effectively solving complex problems in human society still required atomistic disassembly of the problem and the incorporation of processes that were more in line with human problem-solving habits.
"You may be familiar with the concept of Standard Operating Procedures (SOPs). By assigning SOPs to different roles and leveraging the expertise and tool capabilities of each role, we can significantly improve the performance of large models on complex problems." The MetaGPT framework is based on this concept and proposes a multi-intelligentsia architecture with embedded SOPs, aiming at realizing the meta-learning or meta-programming capabilities of the intelligentsia," explains Siren Hong.
This approach achieved significant improvements in benchmarks such as HumanEval and MBPP, outperforming the GPT-4 model at the time, and the MetaGPT team also validated this idea in typical software development scenarios, such as the classic 2048 mini-game and Snake game. The overall success rate of MetaGPT is significantly higher than other open source frameworks in the same period.
Data Interpreter: An Intelligent Assistant in Data Science
Building on the MetaGPT framework and the design of the intelligences, the team realized that the intelligences also required more robust planning capabilities and tool usage, especially when solving machine learning or data modeling problems.
On the one hand, machine learning/data modeling processes can often be planned with the capabilities of large models, which can be more focused on task execution and implementation. On the other hand, when working with large tabular data, it is not possible to directly input all the data due to the context length limitation of large models. Therefore, it is necessary for the intelligences to interact with the data through code forms. Based on these considerations, the MetaGPT team started exploring planning capabilities and tool usage capabilities in the second half of 2023 with the innovation Data Interpreter.
exist Devin During the period when projects such as MetaGPT have attracted widespread attention, the MetaGPT team discovered that Data Interpreter has reached the level of a junior data analyst in tasks such as data modeling/machine learning. Users only need to give data to Data Interpreter, and it can independently complete complex AI tasks from data preprocessing to NLP/CV model training.
SELA: Enhancing Agent Debugging and Feedback Capabilities
In order to further enhance the performance of Data Interpreter, the MetaGPT team felt the need to enhance the debugging capability of the intelligences and the feedback mechanism on the experiment results. To this end, the team developed a work called "SELA". SELA introduces the Monte Carlo Tree Search (MCTS) methodology on top of Data Interpreter, which enables the intelligent body to perform machine learning by means of autonomous experiments The task is automatically optimized, diversity is explored in the reasoning process, and the strategy and solution steps are adjusted based on the feedback of the execution results, thus significantly improving the overall task performance.
Enhanced by SELA, Data Interpreter's capabilities on machine learning tasks were significantly improved, reaching a level comparable to automated machine learning (AutoML) tools and outperforming the best open source projects of the time (e.g., AIDE).
AFlow: Automated Agent Workflow Generation
Meanwhile, the MetaGPT team also explored in improving the reasoning capability of large models based on Monte Carlo Tree Search (MCTS) technology and developed the AFlow work. Unlike solutions with fixed SOPs, AFlow is able to automatically search for the most suitable solution flow for different tasks.
The innovation of AFlow is how to improve the solution effect for different problems, AFlow aims to enable the system to explore the optimal combination of intelligences (topology) based on the feedback of the problem, and eventually make the combination of intelligences for solving the problem more dynamic, and the scale does not need to be predetermined.
AFlow explores and optimizes the combinatorial topology of multiple intelligences by defining a search space for problem atomization and using Monte Carlo methods. This work has achieved SOTA (State-of-the-art) results on all six datasets and has been recognized by ICLR 2025 for Oral, which is a testament to its technological leadership.
FACT: Enhancing Agent's Memory Management Capabilities
The MetaGPT team also noticed that as the number of problem solving steps of an intelligent body increases, the volume of its memory (Memory) also increases. Therefore, how to effectively manage the contextual information of an intelligent body throughout the problem solving process becomes a pressing issue.
To this end, the team presents work called "FACT" that improves the accuracy of large-scale models in fact finding through a multi-needle finding mechanism, and shows significant results in question and answer (QA) tasks. This work has also been accepted by NAACL.
In addition, around September last year, the MetaGPT team also explored the SWE-Bench code capability evaluation platform. They found that in problems such as code repair, Agents need to rely on file localization and finding, as well as computer usage capabilities, and also place higher demands on tool usage and planning capabilities. Many research efforts have used a multi-intelligence approach to solve such long chains of complex reasoning processes. As a result, the MetaGPT team has also added and optimized file location, file finding, and other capabilities to the SWE-Bench tasks that form the basis of the OpenManus code. A look at the OpenManus code reveals that many of the tools are related to code repair and localization.
SPO: a powerful tool for cue word optimization
SPO is a powerful set of optimization tools for cue words. Unlike traditional optimization methods that require large datasets, SPO is suitable for scenarios where accurate ratings are not available or the dataset is limited. For example, when writing copy for Xiaohongshu or performing SEO optimization, users may only have a small number of satisfactory samples, and SPO is able to perform effective cue word optimization under such limited sample conditions. The tool has been open-sourced and has received good user feedback on the Magic Hitch platform and Hugging Face in China.
AOT: Atomic Thinking Fuels Information Reasoning
The AOT (Atomic Thinking) approach is mainly used for question and answer information reasoning and integration tasks, such as integrating information from different passages for reading comprehension. This work has received 350,000 views so far and will be integrated into the MetaGPT framework in the future to further enhance its information processing capabilities.
03 Real-World Challenges for Agents: Anatomy of Ten Core Issues
Q1: Is it possible to fully solve complex problems after large-scale modeling capabilities have been improved?
Siren Hong: "It is true that the success rate of solving many problems increases as the capabilities of larger models increase, but the problems themselves do not go away." For example, on relatively standardized single-function code generation problems like QA Q&A, HumanEval, and MBPP, a single model can already perform very well today.
From last year to this year, the success rate of large-scale models on these problems has approached the level of practical application. At the same time, however, it should be noted that there are still a large number of extremely complex problems with long-tail effects in human society, including machine learning, code fixing, and problems that require searching for combinations of results before they can be made available to users. These areas still need a lot of technological innovation to improve the performance of large-scale models, especially in solving the model "illusion" problem.
Q2: What is the relationship between large-scale modeling capabilities and advances in Agent technology?
Xiang Jinyu: "Agent and large models may have a vertical or orthogonal relationship. The enhancement of the framework itself will gain more functionality because of the enhancement of the model capability, and the two are not in conflict."
The Agent framework enables large models to interact with the physical world or the wider environment by extending it with more tools. At the same time, advances in the large models themselves enhance their reasoning and planning capabilities. The two can be used in conjunction with each other or developed independently.
"The relationship is complementary rather than conflicting." Xiang Jinyu summarized.
Q3. What is the current level of development of Foundation Agent Model?
Xiang Jinyu: "Recently I happen to be following some related research work, although it may not exactly fall into the Foundation Agent Model category."
He mentioned the attempts made by Pan Jiayi's team in the SWE-GYM project, which aims to solve the codebase repair problem. They utilized data generated after running models based on Claude or GPT-4o, and collected trajectory data during Agent operation with the help of frameworks such as Openhands. The trajectory data contains both success and failure cases. They reused the collected trajectory data to train the Qwen open-source model, and observed that the Qwen model's code repair capability was significantly improved after this training. The details of the study have been elaborated in the paper and the research is solid and reliable.
"The current difficulty in generalizing this type of work is that, for example, in SWE-Bench evaluation, we can explicitly determine whether a task is completed correctly, but in real-world application scenarios, it is difficult for us to quantitatively assess the accuracy or quality of task completion in many cases (e.g., writing a novel or a joke)." Xiang Jinyu pointed out, "Just like in the real work scenario, let the interns and senior employees to complete a job at the same time, to rate their performance, in fact, it is difficult to objectively judge, need to be based on a lot of subjective business logic and standards to determine. This automatic design of evaluation feedback under open tasks is also an important direction for our future exploration."
Q4. Does Agent's progress in planning capabilities depend primarily on the large-scale model itself?
Xiang Jinyu: "The current progress in planning depends on the improvement of the model's own capabilities on the one hand, and on the other hand, it cannot be separated from the assistance of external structures, i.e., the inclusion of more complex structures at the Agent's level to assist planning." For example, early work on Tree of Thought (TOT, thinking trees) significantly enhanced the performance of models during task reasoning by introducing additional structure. Similar research work related to external structure assistance also exists in the planning domain.
Q5. What are the difficulties in using external tools for Agent?
Xinbing Liang: "Currently in OpenManus, we are still mainly using some existing open source tools, such as Cloud Computer and Browser. Research by other teams on the use of Browser has shown that these two tools alone can basically accomplish a lot of tasks, and have initially formed the prototype of Manus."
Additionally, on the question of "if an Agent wants to use a certain tool, but no such tool currently exists," Liang Xinbing said the team also envisions the possibility of adding a future capability that empowers Agents to create tools on their own. "When an Agent needs a tool to accomplish a certain task, it can create and use it on its own if there is no suitable tool in the current environment. This will further empower the Agent."
SiRui Hong: "I think the use of tools for large models or Agents is not novel in itself. However, with the gradual increase in the number of tools, the technical difficulty also arises: if there are a large number of tools with similar functions, how can an Agent make accurate decisions, choose the most appropriate tool, and avoid decision errors when solving the same task?"
In addition, if a customized tool is used instead of a standardized tool interface, another problem may be faced: the parameters of the tool are not reasonably defined or not clear enough, which will lead to large models being prone to errors in generating decisions on calling the tool, which in turn will affect the effectiveness of the tool's implementation. These are the key issues that need to be addressed in the tool usage session.
"Another difficulty is that it is not just the selection and use of the tool itself, but the context that may contain a lot of detailed information. For example, when a user opens multiple web pages at the same time, the information and data on these pages (e.g., the time on a particular resume, the start time of an event mentioned in another web page) may be confusing or incorrect when the Agent integrates them to generate the final result. How to ensure that the Agent accurately handles this detailed information when using the tool is also a problem that needs to be focused on in practical applications." Hong Sirui added.
Q6. Will protocols such as MCP become mainstream in terms of tool use?
Liang Xinbing: "The MCP protocol is now becoming more mainstream."
The ability to use the tool actually depends on whether the model itself has a good ability to use the tool. Because some models may not have the ability to use tools, or may be weak in this regard, their effectiveness in using tools will be limited. Therefore, the popularity of tooling protocols is closely related to the strong tooling capabilities of the models themselves.
Q7. What are some of the advances and difficulties for Agent in dealing with massive contexts (Memory management)?
Siren Hong: "By now you may already be aware of some related research work, such as MemoryGPT or the open source project Mem0, which both have some optimizations and processing for longer contexts and memory management for Agents."
For example, MemoryGPT summarizes contexts of a certain length, which is a very plain but effective way of thinking about it.Mem0, on the other hand, proactively uses tools in the memory updating process, which involves operations such as memory deletion, memory updating, and addition.
"Currently, it is a challenging problem for Agents to compress context and store it in memory when dealing with complex, long-range tasks (e.g., when browsing web pages, which can be very long in terms of information) and to make sure that critical information is not modified or omitted after compression." Siren Hong notes that "some early work has shown that memory fades with time or task steps."
On the other hand, there are various types of human memory, not only the memory of semantic information, but also the procedural memory generated by the use of tools, as well as the memory of event-associated relationships. Academics have also optimized for different memory types separately.
The above discussion is about memory management in a single Agent. In a multi-intelligent system, however, memory can be utilized more skillfully. In addition to isolating memories to a certain extent, one would like to reuse the memories generated by other Agents in the process of problem solving to enhance one's own experience in handling specific tasks. In addition, Agents can evolve to reuse the group's problem-solving experience, eventually forming a kind of group intelligence.
Xinbing Liang: "The core problem of memory management is cost." If you don't consider memory management, don't do compression and any processing, and use the complete memory directly, the current large-scale models can still be processed, but the problem this brings is not a performance degradation, but the processing time and cost will be significantly increased, which seriously affects the user experience.
Thus, the memory management problem involves optimization at the engineering level. There are already a number of companies or organizations trying to optimize memory management solutions.
"One current approach to solving the memory management problem is to use a multi-intelligence or tool-assisted approach. For example, in frameworks such as OpenManus, it is common to generate a task plan first through a planning tool, decompose a complex task into multiple subtasks, with incomplete sharing of memory between each subtask, and summarize or compress the process after the task has been executed." Liang Xinbing explained.
Q8. What will Agent ultimately compete for in terms of commercialization on the ground?
Siren Hong: "I think the most important thing is to maximize the tasks and effects in real scenarios, including personalization features." Many of the current research efforts in academia, whether for SWEBench, GAIA or other Agent testing tasks, still have limited task success rates. If this relatively small task standard is applied to real business scenarios, the current Agent success rate is still quite limited in the face of different users and different difficulty problems.
"Therefore, whether it is a programming task or a data collection and report generation task, if we can do the best we can for a wide variety of user problems and scenarios, increase the success rate to a satisfactory level, and truly realize that the Agent can achieve the action capabilities that people expect today, I believe that users will continue to use the Agent as an assistant and a tool in their daily lives. " Hong Sirui emphasized.
Q9. The current cost of Manus, OpenManus and other agents is high, how can we further reduce the cost and improve the efficiency?
Siren Hong: "First of all, a large number of application vendors, including ourselves, optimize for Token consumption. Whether it's at the engineering level through caching, or memory compression techniques, the goal is to minimize the context length of each API call, and that's the direction of ongoing optimization at the application level."
"In addition, in the future, it is likely that people will deploy a large number of small models to fine-tune or reinforcement learning based on existing data, focusing on optimizing the ability to use certain specific nodes or tools. By integrating the capabilities of multiple small models, it is expected to complete or even surpass large models. This can lead to significant cost advantages in terms of inference speed, Token consumption and expenses." Siren Hong added.
Q10. How do I assess the business prospects of Multi-Intelligence?
Siren Hong: "First of all, we believe that in the field of code generation, both single Agent and multi-intelligent body systems are expected to achieve commercialization much earlier."
"We have found that a large number of users, who are average programmers but understand some basic concepts, have a great need for the assistance of intelligentsia or large models when they want to build a personal website or simple application on their own. If users use large models directly, it may require multiple rounds of interaction and a tedious debugging process. But with a productized system of intelligences, the process is much easier. Users may only need to spend 15 minutes or half an hour, even including subsequent changes in requirements, to get a satisfactory website or application quickly."
"Therefore, I think the business prospects of multi-intelligentsia are clear and strong in terms of really effectively solving the actual needs of users, and code generation is also a scenario that Agent technology is currently able to solve better. At present, users' willingness to pay in this regard is also relatively high." Hong Sirui concluded.
04 Agent Commercialization: Code Generation Takes the Lead in Breaking Ground
Q1. Can you briefly introduce MGX, a multi-intelligence product?
Siren Hong: "If people are familiar with MetaGPT, they will understand the MGX It is a product where multiple intelligences collaborate online at the same time to help users solve problems. Users just need to use it like ChatGPT As soon as a requirement is entered, a powerful intelligence will break down the task and distribute it to different intelligences to execute it."
"The whole product is currently focused on the field of code generation. For example, if a user wants to create a personal website, a game, or a data analytics application, etc., our intelligent body can accomplish the task very well. During the development process, users can modify their requirements at any time, such as adjusting the style, typography or layout of the front-end project, which our intelligent bodies can also do naturally, thus significantly reducing development costs."
Unlike products such as Manus and OpenManus, MGX has automatic deployment capabilities. During the development process, the software is automatically deployed and users can preview and adjust the results in real time. In addition, each of the intelligences in the MGX product has the previously mentioned computer tool calls, browser tool calls, and planning and code execution capabilities.
"We are also exploring aesthetic assessment of design or data visualization effects internally, and in the future we may form corresponding Benchmarks to help large models or Agents learn to assess whether the generated pages or data dashboards meet user expectations and aesthetic standards." Hong Sirui revealed.
Below are some examples of websites generated by MGX:
Personal website:
- https://alex-portfolio-yhx5c3-v1.mgx.world/
- https://photographer-portfolio-myuf2t-v1.mgx.world
Personal Blog:
- https://personal-blog-v7amdv-v2.mgx.world
- https://cute-cartoon-blog-p58801-v1.mgx.world
Personal business cards:
- https://portfolio-dveerm-v1.mgx.world
- https://emma-anderson-homepage-8rnqm6-v1.mgx.world
Q2. Will MGX DEV follow up with new Agent types?
Siwei Hong: "MGX will continue to add new types of Agents in the future. We are currently experimenting internally with a new type of intelligence called User Agent." When a user's project is deployed, it may fail to run directly or have defects, resulting in blank pages, etc. User Agent will actively detect the effects of project deployment, such as taking screenshots of the page, actively interacting with the webpage, testing the feasibility and executability of the generated software, and then further notifying other intelligences responsible for development to fix it in order to complete the project in a more perfect way. "In addition, we may also internally precipitate Benchmarks for aesthetic assessment of design or data visualization effects, enabling Agent to determine whether the quality and aesthetic performance of a page or data dashboard meets expectations." Hong Siren added.