AI Personal Learning
and practical guidance
CyberKnife Drawing Mirror

OpenManus Open Source Explained, with Insights into the AI Agent Architecture Behind It

Opening: The Manus Fire and the OpenManus Breakdown

One of the big things that has been happening in AI circles lately is that the Manus Manus is a powerful and flexible AI Agent that attracts a lot of attention. Simply put, Manus is like an all-around assistant, whether it's programming, looking up information, handling documents, or surfing the Internet, it can help you get it done.

However, it's not that easy to use Manus, you have to have an invitation code. This has blocked many developers and researchers from using Manus. Just when everyone was at a loss, the open source community stepped in! The MetaGPT team of @mannaandpoem, @XiangJinyu, @MoshiQAQ, and @didiforgithub spent just 3 hours to come up with an open source project called OpenManus. Now you can experience the power of Manus without an invitation code! What's even more exciting is that OpenManus is an open source project, which means you can modify and extend it to suit your needs!


OpenManus: Open source version of Manus-1 by MetaGPT

The emergence of OpenManus not only gives more people the opportunity to experience the charm of AI Agent, but also injects new vitality into the development of AI Agent. For those of us who are involved in technology, OpenManus is not only a good tool, but also an excellent learning resource. By studying its code, we can gain a deeper understanding of the framework design and implementation details of AI Agent.

 

AI Agent Framework: The Design Philosophy of OpenManus

The code structure of OpenManus is very clear and adopts a modular design, which is like building blocks to combine different functional modules together. The benefit of this design is that the code is highly reusable and extensible, and the responsibilities of each module are clear.

The core components of OpenManus include:

OpenManus
├─ Agent (Agent Layer)
│ ├── BaseAgent (Base abstract class)
│ ├── ReActAgent (think-act mode)
│ ├── ToolCallAgent (Tool Calling Ability)
│ ├── PlanningAgent (Planning capability)
│ ├── SWEAgent (Software Engineering Capability)
│ └─ Manus (Generic Agent)
├─ LLM (Language Modeling Layer) │ ├── Memory (Language Modeling Layer)
├─ Memory (Memory Layer)
├─ Tool (Tool Layer)
│ ├── BaseTool (Tool base class)
│ ├─ PlanningTool
│ ├── PythonExecute (Python Execute)
│ ├── GoogleSearch (Search Tool)
│ ├── BrowserUseTool (Browser Tool)
│ └── ... (Other Tools)
├─ Flow (Workflow Layer) │ ├─ BaseFlow (Workflow Layer)
│ ├── BaseFlow (Base Flow)
│ └─ PlanningFlow (Planning Flow)
└─ Prompt (Prompt word layer) │ └─ Prompt (Prompt word layer)

LLM Components: The Brain of the Agent

If we compare an agent to a person, then the LLM (Large Language Model) is the brain of the agent, which is responsible for understanding user commands, generating responses, and making decisions. It is responsible for understanding user commands, generating responses, and making decisions. OpenManus encapsulates the interaction with the language model through the LLM class.

class LLM.
_instances: Dict[str, "LLM"] = {} # Singleton Pattern Implementation
def __init__(
self, config_name: str = "default", llm_config: Optional[LLMSettings] = None
).
if not hasattr(self, "client"): # initialize only once
llm_config = llm_config or config.llm
llm_config = llm_config.get(config_name, llm_config["default"])
self.model = llm_config.model
self.max_tokens = llm_config.max_tokens
self.temperature = llm_config.temperature
self.client = AsyncOpenAI(
api_key=llm_config.api_key, base_url=llm_config.base_url
)

The LLM class provides two core methods:

  • ask:: Send a general dialog request
  • ask_tool:: Send requests with tool calls
async def ask_tool(
self,
messages: List[Union[dict, Message]], system_msgs: Optional[List[Union[dict, Message]]] = None, system_msgs: Optional[List[Union[dict, Message]]] = None, system_msgs.
system_msgs: Optional[List[Union[dict, Message]]] = None, timeout: int = 60, async ask_tool(
system_msgs: Optional[List[Union[dict, Message]]] = None, timeout: int = 60,

tools_choice: Literal["none", "auto", "required"] = "auto", temperature: Optional[float] = None
temperature: Optional[float] = None,
**kwargs.
).
# Formatting messages
if system_msgs.
system_msgs = self.format_messages(system_msgs)
messages = system_msgs + self.format_messages(messages)
messages = self.format_messages(system_msgs)
messages = self.format_messages(messages)
# Send the request
response = await self.client.chat.completions.create(
model=self.model,
messages=messages, temperature=temperature, or self.temperature, or
temperature=temperature or self.temperature, max_tokens=self.temperature, max_tokens=self.temperature
max_tokens=self.max_tokens, tools=tools,
max_tokens=self.max_tokens, tools=tools, tool_choice=tool_choice

timeout=timeout,
**kwargs.
)

Memory component: Agent's memory

The Memory component is like a notebook for the Agent, responsible for recording and managing the Agent's dialog history. With Memory, the Agent can remember what it has said before and maintain the consistency of the conversation.

class Memory(BaseModel).
"""Stores and manages agent's conversation history.""""
messages: List[Message] = Field(default_factory=list)
def add_message(self, message: Union[Message, dict]) -> None.
"""Add a message to memory.""""
if isinstance(message, dict).
message = Message(**message)
self.messages.append(message)
def get_messages(self) -> List[Message].
"""Get all messages in memory.""""
return self.messages

The Memory component is the core component of the Agent, and it is accessed through the BaseAgent's update_memory method to add a new message:

def update_memory(
self.
role: Literal["user", "system", "assistant", "tool"],
content: str, **kwargs, **kwargs, **kwargs
**kwargs.
) -> None.
"""Add a message to the agent's memory.""""
message_map = {
"user": Message.user_message, """Add a message to the agent's memory."""
"user": Message.user_message, "system": Message.system_message, "assistant": Message.
"assistant": Message.assistant_message, "tool": lambda content, message_map
"tool": lambda content, **kw: Message.tool_message(content, **kw),
}
if role not in message_map.
raise ValueError(f "Unsupported message role: {role}")
msg_factory = message_map[role]
msg = msg_factory(content, **kwargs) if role == "tool" else msg_factory(content)
self.memory.add_message(msg)

Tools component: Agent's toolbox

The Tools component is the bridge between the Agent and the outside world, and OpenManus implements a flexible tool system that allows the Agent to invoke a variety of tools to accomplish tasks.

class BaseTool(ABC, BaseModel).
name: str
description: str
parameters: Optional[dict] = None
async def __call__(self, **kwargs) -> Any.
"""Execute the tool with given parameters."""""
return await self.execute(**kwargs)
    @abstractmethod
async def execute(self, **kwargs) -> Any.
"""Execute the tool with given parameters."""""
def to_param(self) -> Dict.
"""Convert tool to function call format.""""
return {
"type": "function",
"function": {
"name": self.name, "description": self.description, "function": {
"description": self.description, "parameters": self.
"parameters": self.parameters, {
}, }
}

The results of the tool's execution are summarized in ToolResult class to represent it:

class ToolResult(BaseModel).
"""Represents the result of a tool execution.""""
output: Any = Field(default=None)
error: Optional[str] = Field(default=None)
system: Optional[str] = Field(default=None)

OpenManus provides a number of built-in tools such as PlanningTool::

class PlanningTool(BaseTool).
"""
A planning tool that allows the agent to create and manage plans for solving complex tasks.
The tool provides functionality for creating plans, updating plan steps, and tracking progress.
"""
name: str = "planning"
description: str = _PLANNING_TOOL_DESCRIPTION
parameters: dict = {
"type": "object", "properties": {
"properties": {
"command": {
"description": "The command to execute. Available commands: create, update, list, get, set_active, mark_step, delete.",
"enum": [
"create",
"create", "update",
"create", "update", "list", "get", "set_active", mark_step, delete.
"get", "set_active", "set_step", "list", "get".


"delete".
], "type": "string".
"type": "string", },, "type": "string", }, "type".
}, .
# Other parameters...
},, "required".
"required": ["command"], }, # Other parameters...
}

Planning component: planning capabilities of the Agent

The Planning component is the key to OpenManus' ability to handle complex tasks. It allows an Agent to create a plan that breaks down a complex task into small, step-by-step tasks that can be completed one by one.

The Planning component consists of two main parts:

  1. PlanningTool: Provides plan creation, update and tracking capabilities.
  2. PlanningAgent: Use PlanningTool to perform task planning and execution.
class PlanningAgent(ToolCallAgent).
"""
An agent that creates and manages plans to solve tasks.
This agent uses a planning tool to create and manage structured plans, and tracks progress through individual steps until task completion.
This agent uses a planning tool to create and manage structured plans, and tracks progress through individual steps until task completion.
"""
name: str = "planning"
description: str = "An agent that creates and manages plans to solve tasks"
system_prompt: str = PLANNING_SYSTEM_PROMPT
next_step_prompt: str = NEXT_STEP_PROMPT
available_tools: ToolCollection = Field(
default_factory=lambda: ToolCollection(PlanningTool(), Terminate())
)
# Step Execution Tracker
step_execution_tracker: Dict[str, Dict] = Field(default_factory=dict)
current_step_index: Optional[int] = None

PlanningAgent The core methodology includes:

async def think(self) -> bool.
"""Decide the next action based on plan status."""""
prompt = (
f "CURRENT PLAN STATUS:n{await self.get_plan()}nn{self.next_step_prompt}""
if self.active_plan_id
else self.next_step_prompt
)
self.messages.append(Message.user_message(prompt))
# Get current step index
self.current_step_index = await self._get_current_step_index()
result = await super().think()
# Associate a tool call with the current step
if result and self.tool_calls:
# ... Associate the logic...
return result

Flow Component: Collaboration Capabilities for Agents

The role of the Flow component is to coordinate multiple Agents working together to accomplish more complex tasks.

class BaseFlow(BaseModel, ABC).
"""Base class for execution flows supporting multiple agents""""
agents: Dict[str, BaseAgent]
tools: Optional[List] = None
primary_agent_key: Optional[str] = None
    @property
def primary_agent(self) -> Optional[BaseAgent].
"""Get the primary agent for the flow""""
return self.agents.get(self.primary_agent_key)
    @abstractmethod
async def execute(self, input_text: str) -> str.
"""Execute the flow with given input"""""

PlanningFlow is a concrete Flow implementation for planning and executing tasks:

class PlanningFlow(BaseFlow).
"""A flow that manages planning and execution of tasks using agents.""""
llm: LLM = Field(default_factory=lambda: LLM())
planning_tool: PlanningTool = Field(default_factory=PlanningTool)
executor_keys: List[str] = Field(default_factory=list)
active_plan_id: str = Field(default_factory=lambda: f "plan_{int(time.time())}")
current_step_index: Optional[int] = None
async def execute(self, input_text: str) -> str.
"""Execute the planning flow with agents."""""
try.
# Create the initial plan
if input_text: await self._create_initial_plan
await self._create_initial_plan(input_text)
# Execute the planning step
while await self._has_next_step():: # Get the current step.
# Get the current step
step_info = await self._get_current_step()
# Select the appropriate executor
executor = self.get_executor(step_info.get("type"))
# Execute the step
result = await self._execute_step(executor, step_info)
# Update step status
await self._update_step_status(step_info["index"], "completed")
# Complete the program
return await self._finalize_plan()
except Exception as e.
# Handling exceptions
return f "Error executing flow: {str(e)}"

Agent Implementation of OpenManus: A Layered Architecture

OpenManus' Agent uses a hierarchical architecture, built layer by layer from basic functionality to specialized applications. The benefits of this design are high code reusability, high scalability, and clear responsibilities at each level.

BaseAgent (abstract base class)
└─ ReActAgent (think-act mode)
└─ ToolCallAgent (Tool Calling Capability)
├─ PlanningAgent (planning capability)
├── SWEAgent (software engineering capability)
└─ Manus (generic agent)

BaseAgent: the base of bases

BaseAgent is the foundation of the entire framework and defines the core properties and methods of an Agent:

class BaseAgent(BaseModel, ABC).
"""Abstract base class for managing agent state and execution.""""
# Core Properties
name: str = Field(... , description="Unique name of the agent")
description: Optional[str] = Field(None, description="Optional agent description")
# Prompt
system_prompt: Optional[str] = Field(None, description="System-level instruction prompt")
next_step_prompt: Optional[str] = Field(None, description="Prompt for determining next action")
# Dependencies
llm: LLM = Field(default_factory=LLM, description="Language model instance")
memory: Memory = Field(default_factory=Memory, description="Agent's memory store")
state: AgentState = Field(default=AgentState.IDLE, description="Current agent state")
# Execution Control
max_steps: int = Field(default=10, description="Maximum steps before termination")
current_step: int = Field(default=0, description="Current step in execution")

ReActAgent: The Thinking Agent

ReActAgent A "think-act" model has been implemented, dividing the Agent's execution process into two phases:

class ReActAgent(BaseAgent, ABC).
    @abstractmethod
async def think(self) -> bool.
"""Process current state and decide next action""""
    @abstractmethod
async def act(self) -> str.
"""Execute decided actions""""
async def step(self) -> str.
"""Execute a single step: think and act.""""
should_act = await self.think()
if not should_act: return "Thinking complete - no action.
return "Thinking complete - no action needed"
return await self.act()

ToolCallAgent: Tool-capable Agent

ToolCallAgent Adds the ability to use tools to Agent:

class ToolCallAgent(ReActAgent).
"""Base agent class for handling tool/function calls with enhanced abstraction"""""
available_tools: ToolCollection = ToolCollection(
CreateChatCompletion(), Terminate()
)
tool_choices: Literal["none", "auto", "required"] = "auto"
async def think(self) -> bool.
# get LLM response and tool selection
response = await self.llm.ask_tool(
messages=self.messages,
system_msgs=[Message.system_message(self.system_prompt)]
if self.system_prompt
messages, system_msgs=[Message.system_message(self.system_prompt)
tools=self.available_tools.to_params(),
tool_choice=self.tool_choices, )
)
self.tool_calls = response.tool_calls
# Handling Responses and Tool Calls
# ...
async def act(self) -> str.
# Execute tool calls
results = []
for command in self.tool_calls:
results = await self.execute_tool(command)
# Add tool response to memory
# ...
results.append(result)
return "nn".join(results)

PlanningAgent: Agent that plans.

PlanningAgent Task planning and execution tracking were realized:

class PlanningAgent(ToolCallAgent).
"""
An agent that creates and manages plans to solve tasks.
This agent uses a planning tool to create and manage structured plans, and tracks progress through individual steps until task completion.
This agent uses a planning tool to create and manage structured plans, and tracks progress through individual steps until task completion.
"""
# Step Execution Tracker
step_execution_tracker: Dict[str, Dict] = Field(default_factory=dict)
current_step_index: Optional[int] = None
async def think(self) -> bool.
"""Decide the next action based on plan status."""""
prompt = (
f "CURRENT PLAN STATUS:n{await self.get_plan()}nn{self.next_step_prompt}""
if self.active_plan_id
else self.next_step_prompt
)
self.messages.append(Message.user_message(prompt))
# Get current step index
self.current_step_index = await self._get_current_step_index()
result = await super().think()
# Associate a tool call with the current step
if result and self.tool_calls:
# ... Associate the logic...
return result

Manus: Omnipotent Agent

Manus is the core Agent of OpenManus, which integrates tools and capabilities to handle a wide variety of tasks:

class Manus(ToolCallAgent).
"""
A versatile general-purpose agent that uses planning to solve various tasks.
This agent extends PlanningAgent with a comprehensive set of tools and capabilities, including Python execution, web browsing, file operations, and information management.
This agent extends PlanningAgent with a comprehensive set of tools and capabilities, including Python execution, web browsing, file operations, and information retrieval
to handle a wide range of user requests.
"""
name: str = "manus"
description: str = "A versatile general-purpose agent"
system_prompt: str = SYSTEM_PROMPT
next_step_prompt: str = NEXT_STEP_PROMPT
available_tools: ToolCollection = Field(
default_factory=lambda: ToolCollection(
PythonExecute(), GoogleSearch(), BrowserUseTool(), FileSaver(), Terminate()
)
)

Prompt: Behavioral Guidelines for Agents

The Prompt plays a vital role in building an Agent system, acting as an instruction manual that tells the Agent what to do.

System Prompt: Defining the Role of an Agent

The System Prompt sets the basic roles and behavioral guidelines for an Agent:

SYSTEM_PROMPT = "You are OpenManus, an all-capable AI assistant, aimed at solving any task presented by the user. You have various tools at your disposal that you can call upon to efficiently complete complex requests. You have various tools at your disposal that you can call upon to efficiently complete complex requests. Whether it's programming, information retrieval, file processing, or web browsing, you can handle it all."

This Prompt tells the Agent that it is an all-purpose AI assistant that can utilize a variety of tools to fulfill the user's request.

Planning Prompt: guides Agent in planning

The Planning Prompt tells the Agent how to break down complex tasks into smaller tasks with an execution plan:

PLANNING_SYSTEM_PROMPT = """
You are an expert Planning Agent tasked with solving complex problems by creating and managing structured plans.
Your job is.
1. Analyze requests to understand the task scope
2. Create clear, actionable plans with the `planning` tool
3. Execute steps using available tools as needed
4. Track progress and adapt plans dynamically
5. Use `finish` to conclude when the task is complete
Available tools will vary by task but may include.
- `planning`: Create, update, and track plans (commands: create, update, mark_step, etc.)
- `finish`: End the task when complete
Break tasks into logical, sequential steps. Think about dependencies and verification methods.
"""

This Prompt tells Agent that it is a planning expert and needs to use the planning tools to create, update and track plans.

Tool Usage Prompt: Tells the Agent how to use the tool.

The Tool Usage Prompt describes in detail the functions and usage scenarios of each tool to help the Agent choose the right tool:

NEXT_STEP_PROMPT = """You can interact with the computer using PythonExecute, save important content and information files through FileSaver, open browsers with BrowserUseTool, and retrieve information using GoogleSearch.
PythonExecute: Execute Python code to interact with the computer system, data processing, automation tasks, etc.
FileSaver: Save files locally, such as txt, py, html, etc.
BrowserUseTool: Open, browseIf you open a local HTML file, you must provide the absolute path to the file.
GoogleSearch: Perform web information retrieval
Based on user needs, proactively select the most appropriate tool or combination of tools. For complex tasks, you can break down the problem and use different tools step by step to solve it. For complex tasks, you can break down the problem and use different tools step by step to solve it. After using each tool, clearly explain the execution results and suggest the next steps.
"After using each tool, clearly explain the execution results and suggest the next steps.

Dynamic Prompt: Making Agents More Flexible

Prompts in OpenManus can not only be static, they can also be generated dynamically. For example, in the PlanningAgent in the Prompt, the system adds the current schedule state to the Prompt:

async def think(self) -> bool.
"""Decide the next action based on plan status."""""
prompt = (
f "CURRENT PLAN STATUS:n{await self.get_plan()}nn{self.next_step_prompt}""
if self.active_plan_id
else self.next_step_prompt
)
self.messages.append(Message.user_message(prompt))

This dynamic Prompt allows the Agent to make more rational decisions based on the current situation.

 

Summarizing: Insights from OpenManus

By analyzing the OpenManus code, we can summarize several key components of the AI Agent framework:

  1. Agent: Hierarchical design from basic to specialized to achieve different levels of competence.
    • BaseAgent: Provides basic state management and execution loops.
    • ReActAgent: Realization of the think-act model.
    • ToolCallAgent: Add tool-calling capabilities.
    • Professional Agent: e.g. PlanningAgent,SWEAgent cap (a poem) ManusThe
  2. LLM: Encapsulate interaction with large language models, providing dialog and tool invocation capabilities.
    • Support for normal dialogs and tool calls.
    • Implement retry mechanism and error handling.
    • Streaming responses are supported.
  3. Memory: Manage dialog history and context.
    • Store and retrieve messages.
    • Maintaining Conversation Context.
  4. Tool: Provide an interface to interact with the outside world.
    • Basic tool abstraction.
    • Multiple specialized tools to achieve this.
    • Tool results processing.
  5. Planning: Enabling task planning and execution tracking.
    • Program creation and management.
    • Step Status Tracking.
    • Dynamic Adjustment Program.
  6. Flow: Manage collaboration across multiple Agents.
    • Tasking.
    • Results Integration.
    • Process Control.
  7. Prompt: Guides the Agent's behavior and decision making.
    • System Prompt defines the role.
    • The Professional Prompt guides decisions.
    • Dynamic Prompt generation.

With its clear design and well-structured code, OpenManus is an excellent example for learning about AI Agent implementations. Its modular design allows developers to easily extend and customize their agents.

OpenManus provides a good starting point for developers who want to learn more about AI Agents or build their own agent systems. By learning about its architecture and implementation, we can better understand how AI Agents work and how they are designed.

CDN1
May not be reproduced without permission:Chief AI Sharing Circle " OpenManus Open Source Explained, with Insights into the AI Agent Architecture Behind It

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish