오픈마누스 오픈소스 설명과 그 이면의 AI 에이전트 아키텍처에 대한 인사이트

43.7K 00

오프닝: 마누스 화재와 오픈마누스 고장

최근 AI 업계에서 일어나고 있는 큰 일 중 하나는 Manus 인공지능 에이전트인 마누스는 강력한 기능과 유연한 사용법으로 많은 관심을 받고 있습니다. 간단히 말해, 마누스는 프로그래밍, 정보 검색, 문서 처리, 인터넷 서핑 등 모든 작업을 도와주는 만능 비서 같은 존재입니다.

하지만 마누스를 사용하기 위해서는 초대 코드가 있어야 합니다. 이 때문에 많은 개발자와 연구자들이 Manus를 사용하지 못했습니다. 모두가 고민하고 있을 때 오픈 소스 커뮤니티가 나섰습니다! @mannaandpoem, @XiangJinyu, @MoshiQAQ, @didiforgithub로 구성된 MetaGPT 팀은 단 3시간 만에 OpenManus라는 오픈 소스 프로젝트를 만들어냈습니다. 이제 초대 코드 없이도 Manus의 강력한 기능을 경험할 수 있습니다! 더욱 흥미로운 점은 OpenManus가 오픈 소스 프로젝트이기 때문에 여러분의 필요에 맞게 수정하고 확장할 수 있다는 것입니다!

오픈마누스의 등장은 더 많은 사람들이 AI 에이전트의 매력을 경험할 수 있는 기회를 제공할 뿐만 아니라 AI 에이전트 개발에 새로운 활력을 불어넣고 있습니다. 기술에 종사하는 우리들에게 OpenManus는 좋은 도구일 뿐만 아니라 훌륭한 학습 리소스이기도 합니다. 코드를 공부함으로써 AI 에이전트의 프레임워크 설계와 구현 세부 사항을 더 깊이 이해할 수 있습니다.

AI 에이전트 프레임워크: OpenManus의 설계 철학

OpenManus의 코드 구조는 매우 명확하며 서로 다른 기능 모듈을 결합하는 빌딩 블록과 같은 모듈식 설계를 채택하고 있습니다. 이 설계의 장점은 코드의 재사용성과 확장성이 뛰어나고 각 모듈의 책임이 명확하다는 것입니다.

OpenManus의 핵심 구성 요소는 다음과 같습니다:

OpenManus
├── Agent (代理层)
│   ├── BaseAgent (基础抽象类)
│   ├── ReActAgent (思考-行动模式)
│   ├── ToolCallAgent (工具调用能力)
│   ├── PlanningAgent (规划能力)
│   ├── SWEAgent (软件工程能力)
│   └── Manus (通用代理)
├── LLM (语言模型层)
├── Memory (记忆层)
├── Tool (工具层)
│   ├── BaseTool (工具基类)
│   ├── PlanningTool (规划工具)
│   ├── PythonExecute (Python 执行)
│   ├── GoogleSearch (搜索工具)
│   ├── BrowserUseTool (浏览器工具)
│   └── ... (其他工具)
├── Flow (工作流层)
│   ├── BaseFlow (基础流程)
│   └── PlanningFlow (规划流程)
└── Prompt (提示词层)

LLM 구성 요소: 에이전트의 두뇌

에이전트를 사람에 비유하면 LLM(대규모 언어 모델)은 에이전트의 두뇌로 사용자 명령을 이해하고, 응답을 생성하며, 의사 결정을 내리는 역할을 담당합니다. 사용자 명령을 이해하고, 응답을 생성하고, 의사 결정을 내리는 역할을 합니다. OpenManus는 LLM 클래스를 통해 LLM과의 상호 작용을 캡슐화합니다.

class LLM:
_instances: Dict[str, "LLM"] = {}  # 单例模式实现
def __init__(
self, config_name: str = "default", llm_config: Optional[LLMSettings] = None
):
if not hasattr(self, "client"):  # 只初始化一次
llm_config = llm_config or config.llm
llm_config = llm_config.get(config_name, llm_config["default"])
self.model = llm_config.model
self.max_tokens = llm_config.max_tokens
self.temperature = llm_config.temperature
self.client = AsyncOpenAI(
api_key=llm_config.api_key, base_url=llm_config.base_url
)

LLM 클래스는 두 가지 핵심 메서드를 제공합니다:

ask:: 일반 대화 요청 보내기
ask_tool:: 도구 호출로 요청 보내기

async def ask_tool(
self,
messages: List[Union[dict, Message]],
system_msgs: Optional[List[Union[dict, Message]]] = None,
timeout: int = 60,
tools: Optional[List[dict]] = None,
tool_choice: Literal["none", "auto", "required"] = "auto",
temperature: Optional[float] = None,
**kwargs,
):
# 格式化消息
if system_msgs:
system_msgs = self.format_messages(system_msgs)
messages = system_msgs + self.format_messages(messages)
else:
messages = self.format_messages(messages)
# 发送请求
response = await self.client.chat.completions.create(
model=self.model,
messages=messages,
temperature=temperature or self.temperature,
max_tokens=self.max_tokens,
tools=tools,
tool_choice=tool_choice,
timeout=timeout,
**kwargs,
)

메모리 구성 요소: 상담원의 메모리

메모리 구성 요소는 상담원의 대화 기록을 기록하고 관리하는 역할을 하는 상담원의 노트북과 같은 역할을 합니다. 메모리를 사용하면 상담원이 이전에 말한 내용을 기억하고 대화의 일관성을 유지할 수 있습니다.

class Memory(BaseModel):
"""Stores and manages agent's conversation history."""
messages: List[Message] = Field(default_factory=list)
def add_message(self, message: Union[Message, dict]) -> None:
"""Add a message to memory."""
if isinstance(message, dict):
message = Message(**message)
self.messages.append(message)
def get_messages(self) -> List[Message]:
"""Get all messages in memory."""
return self.messages

메모리 컴포넌트는 에이전트의 핵심 컴포넌트이며, 베이스에이전트의 update_memory 메서드를 사용하여 새 메시지를 추가할 수 있습니다:

def update_memory(
self,
role: Literal["user", "system", "assistant", "tool"],
content: str,
**kwargs,
) -> None:
"""Add a message to the agent's memory."""
message_map = {
"user": Message.user_message,
"system": Message.system_message,
"assistant": Message.assistant_message,
"tool": lambda content, **kw: Message.tool_message(content, **kw),
}
if role not in message_map:
raise ValueError(f"Unsupported message role: {role}")
msg_factory = message_map[role]
msg = msg_factory(content, **kwargs) if role == "tool" else msg_factory(content)
self.memory.add_message(msg)

도구 구성 요소: 상담원의 도구 상자

도구 구성 요소는 에이전트와 외부 세계를 연결하는 다리 역할을 하며, OpenManus는 에이전트가 다양한 도구를 호출하여 작업을 수행할 수 있는 유연한 도구 시스템을 구현합니다.

class BaseTool(ABC, BaseModel):
name: str
description: str
parameters: Optional[dict] = None
async def __call__(self, **kwargs) -> Any:
"""Execute the tool with given parameters."""
return await self.execute(**kwargs)
    @abstractmethod
async def execute(self, **kwargs) -> Any:
"""Execute the tool with given parameters."""
def to_param(self) -> Dict:
"""Convert tool to function call format."""
return {
"type": "function",
"function": {
"name": self.name,
"description": self.description,
"parameters": self.parameters,
},
}

도구 실행 결과는 다음과 같이 표시됩니다. ToolResult 클래스로 표현할 수 있습니다:

class ToolResult(BaseModel):
"""Represents the result of a tool execution."""
output: Any = Field(default=None)
error: Optional[str] = Field(default=None)
system: Optional[str] = Field(default=None)

OpenManus는 다음과 같은 다양한 기본 제공 도구를 제공합니다. PlanningTool::

class PlanningTool(BaseTool):
"""
A planning tool that allows the agent to create and manage plans for solving complex tasks.
The tool provides functionality for creating plans, updating plan steps, and tracking progress.
"""
name: str = "planning"
description: str = _PLANNING_TOOL_DESCRIPTION
parameters: dict = {
"type": "object",
"properties": {
"command": {
"description": "The command to execute. Available commands: create, update, list, get, set_active, mark_step, delete.",
"enum": [
"create",
"update",
"list",
"get",
"set_active",
"mark_step",
"delete",
],
"type": "string",
},
# 其他参数...
},
"required": ["command"],
}

계획 구성 요소: 에이전트의 계획 기능

계획 구성 요소는 복잡한 작업을 처리하는 OpenManus 기능의 핵심입니다. 이를 통해 에이전트는 복잡한 작업을 계획하고 작은 단계별 작업으로 세분화하여 하나씩 완료할 수 있습니다.

계획 구성 요소는 크게 두 부분으로 구성됩니다:

PlanningTool계획 생성, 업데이트 및 추적 기능을 제공합니다.
PlanningAgent사용 PlanningTool 를 사용하여 작업 계획 및 실행을 수행합니다.

class PlanningAgent(ToolCallAgent):
"""
An agent that creates and manages plans to solve tasks.
This agent uses a planning tool to create and manage structured plans,
and tracks progress through individual steps until task completion.
"""
name: str = "planning"
description: str = "An agent that creates and manages plans to solve tasks"
system_prompt: str = PLANNING_SYSTEM_PROMPT
next_step_prompt: str = NEXT_STEP_PROMPT
available_tools: ToolCollection = Field(
default_factory=lambda: ToolCollection(PlanningTool(), Terminate())
)
# 步骤执行跟踪器
step_execution_tracker: Dict[str, Dict] = Field(default_factory=dict)
current_step_index: Optional[int] = None

PlanningAgent 핵심 방법론에는 다음이 포함됩니다:

async def think(self) -> bool:
"""Decide the next action based on plan status."""
prompt = (
f"CURRENT PLAN STATUS:n{await self.get_plan()}nn{self.next_step_prompt}"
if self.active_plan_id
else self.next_step_prompt
)
self.messages.append(Message.user_message(prompt))
# 获取当前步骤索引
self.current_step_index = await self._get_current_step_index()
result = await super().think()
# 关联工具调用与当前步骤
if result and self.tool_calls:
# ...关联逻辑...
return result

플로우 구성 요소: 상담원을 위한 협업 기능

플로우 구성 요소의 역할은 더 복잡한 작업을 수행하기 위해 함께 작업하는 여러 에이전트를 조정하는 것입니다.

class BaseFlow(BaseModel, ABC):
"""Base class for execution flows supporting multiple agents"""
agents: Dict[str, BaseAgent]
tools: Optional[List] = None
primary_agent_key: Optional[str] = None
    @property
def primary_agent(self) -> Optional[BaseAgent]:
"""Get the primary agent for the flow"""
return self.agents.get(self.primary_agent_key)
    @abstractmethod
async def execute(self, input_text: str) -> str:
"""Execute the flow with given input"""

PlanningFlow 는 작업을 계획하고 실행하기 위한 구체적인 흐름 구현입니다:

class PlanningFlow(BaseFlow):
"""A flow that manages planning and execution of tasks using agents."""
llm: LLM = Field(default_factory=lambda: LLM())
planning_tool: PlanningTool = Field(default_factory=PlanningTool)
executor_keys: List[str] = Field(default_factory=list)
active_plan_id: str = Field(default_factory=lambda: f"plan_{int(time.time())}")
current_step_index: Optional[int] = None
async def execute(self, input_text: str) -> str:
"""Execute the planning flow with agents."""
try:
# 创建初始计划
if input_text:
await self._create_initial_plan(input_text)
# 执行计划步骤
while await self._has_next_step():
# 获取当前步骤
step_info = await self._get_current_step()
# 选择合适的执行者
executor = self.get_executor(step_info.get("type"))
# 执行步骤
result = await self._execute_step(executor, step_info)
# 更新步骤状态
await self._update_step_status(step_info["index"], "completed")
# 完成计划
return await self._finalize_plan()
except Exception as e:
# 处理异常
return f"Error executing flow: {str(e)}"

OpenManus의 에이전트 구현: 계층화된 아키텍처

OpenManus의 에이전트는 기본 기능부터 특수 애플리케이션까지 계층 구조로 구축된 계층적 아키텍처를 사용합니다. 이 설계의 장점은 높은 코드 재사용성, 높은 확장성, 각 수준에서 명확한 책임이 있다는 것입니다.

BaseAgent (抽象基类)
└── ReActAgent (思考-行动模式)
└── ToolCallAgent (工具调用能力)
├── PlanningAgent (规划能力)
├── SWEAgent (软件工程能力)
└── Manus (通用代理)

BaseAgent: 베이스의 베이스

BaseAgent 는 전체 프레임워크의 기초이며 에이전트의 핵심 속성과 메서드를 정의합니다:

class BaseAgent(BaseModel, ABC):
"""Abstract base class for managing agent state and execution."""
# 核心属性
name: str = Field(..., description="Unique name of the agent")
description: Optional[str] = Field(None, description="Optional agent description")
# 提示词
system_prompt: Optional[str] = Field(None, description="System-level instruction prompt")
next_step_prompt: Optional[str] = Field(None, description="Prompt for determining next action")
# 依赖组件
llm: LLM = Field(default_factory=LLM, description="Language model instance")
memory: Memory = Field(default_factory=Memory, description="Agent's memory store")
state: AgentState = Field(default=AgentState.IDLE, description="Current agent state")
# 执行控制
max_steps: int = Field(default=10, description="Maximum steps before termination")
current_step: int = Field(default=0, description="Current step in execution")

리액트에이전트: 생각하는 에이전트

ReActAgent 에이전트의 실행 프로세스를 두 단계로 나누어 '생각-행동' 모델을 구현했습니다:

class ReActAgent(BaseAgent, ABC):
    @abstractmethod
async def think(self) -> bool:
"""Process current state and decide next action"""
    @abstractmethod
async def act(self) -> str:
"""Execute decided actions"""
async def step(self) -> str:
"""Execute a single step: think and act."""
should_act = await self.think()
if not should_act:
return "Thinking complete - no action needed"
return await self.act()

ToolCallAgent: 도구 지원 에이전트

ToolCallAgent 상담원에게 툴을 사용할 수 있는 기능을 추가합니다:

class ToolCallAgent(ReActAgent):
"""Base agent class for handling tool/function calls with enhanced abstraction"""
available_tools: ToolCollection = ToolCollection(
CreateChatCompletion(), Terminate()
)
tool_choices: Literal["none", "auto", "required"] = "auto"
async def think(self) -> bool:
# 获取 LLM 响应和工具选择
response = await self.llm.ask_tool(
messages=self.messages,
system_msgs=[Message.system_message(self.system_prompt)]
if self.system_prompt
else None,
tools=self.available_tools.to_params(),
tool_choice=self.tool_choices,
)
self.tool_calls = response.tool_calls
# 处理响应和工具调用
# ...
async def act(self) -> str:
# 执行工具调用
results = []
for command in self.tool_calls:
result = await self.execute_tool(command)
# 添加工具响应到内存
# ...
results.append(result)
return "nn".join(results)

기획 에이전트: 기획하는 에이전트입니다.

PlanningAgent 작업 계획 및 실행 추적이 이루어졌습니다:

class PlanningAgent(ToolCallAgent):
"""
An agent that creates and manages plans to solve tasks.
This agent uses a planning tool to create and manage structured plans,
and tracks progress through individual steps until task completion.
"""
# 步骤执行跟踪器
step_execution_tracker: Dict[str, Dict] = Field(default_factory=dict)
current_step_index: Optional[int] = None
async def think(self) -> bool:
"""Decide the next action based on plan status."""
prompt = (
f"CURRENT PLAN STATUS:n{await self.get_plan()}nn{self.next_step_prompt}"
if self.active_plan_id
else self.next_step_prompt
)
self.messages.append(Message.user_message(prompt))
# 获取当前步骤索引
self.current_step_index = await self._get_current_step_index()
result = await super().think()
# 关联工具调用与当前步骤
if result and self.tool_calls:
# ...关联逻辑...
return result

마누스: 전지전능한 에이전트

Manus 는 다양한 작업을 처리할 수 있는 도구와 기능을 통합한 OpenManus의 핵심 에이전트입니다:

class Manus(ToolCallAgent):
"""
A versatile general-purpose agent that uses planning to solve various tasks.
This agent extends PlanningAgent with a comprehensive set of tools and capabilities,
including Python execution, web browsing, file operations, and information retrieval
to handle a wide range of user requests.
"""
name: str = "manus"
description: str = "A versatile general-purpose agent"
system_prompt: str = SYSTEM_PROMPT
next_step_prompt: str = NEXT_STEP_PROMPT
available_tools: ToolCollection = Field(
default_factory=lambda: ToolCollection(
PythonExecute(), GoogleSearch(), BrowserUseTool(), FileSaver(), Terminate()
)
)

프롬프트: 상담원을 위한 행동 지침

프롬프트는 상담원 시스템 구축에 중요한 역할을 하며, 상담원에게 해야 할 일을 알려주는 사용 설명서 역할을 합니다.

시스템 프롬프트: 상담원의 역할 정의하기

시스템 프롬프트는 상담원의 기본 역할과 행동 지침을 설정합니다:

SYSTEM_PROMPT = "You are OpenManus, an all-capable AI assistant, aimed at solving any task presented by the user. You have various tools at your disposal that you can call upon to efficiently complete complex requests. Whether it's programming, information retrieval, file processing, or web browsing, you can handle it all."

프롬프트는 상담원에게 다양한 도구를 사용하여 사용자의 요청을 이행할 수 있는 다목적 AI 어시스턴트라고 알려줍니다.

계획 프롬프트: 상담원에게 계획을 안내합니다.

계획 프롬프트는 상담원에게 실행 계획을 통해 복잡한 작업을 더 작은 작업으로 세분화하는 방법을 알려줍니다:

PLANNING_SYSTEM_PROMPT = """
You are an expert Planning Agent tasked with solving complex problems by creating and managing structured plans.
Your job is:
1. Analyze requests to understand the task scope
2. Create clear, actionable plans with the `planning` tool
3. Execute steps using available tools as needed
4. Track progress and adapt plans dynamically
5. Use `finish` to conclude when the task is complete
Available tools will vary by task but may include:
- `planning`: Create, update, and track plans (commands: create, update, mark_step, etc.)
- `finish`: End the task when complete
Break tasks into logical, sequential steps. Think about dependencies and verification methods.
"""

이 프롬프트는 상담원에게 기획 전문가이므로 planning 도구를 사용하여 계획을 생성, 업데이트 및 추적할 수 있습니다.

도구 사용 프롬프트: 상담원에게 도구 사용 방법을 알려줍니다.

도구 사용 프롬프트는 상담원이 올바른 도구를 선택할 수 있도록 각 도구의 기능과 사용 시나리오를 자세히 설명합니다:

NEXT_STEP_PROMPT = """You can interact with the computer using PythonExecute, save important content and information files through FileSaver, open browsers with BrowserUseTool, and retrieve information using GoogleSearch.
PythonExecute: Execute Python code to interact with the computer system, data processing, automation tasks, etc.
FileSaver: Save files locally, such as txt, py, html, etc.
BrowserUseTool: Open, browse, and use web browsers.If you open a local HTML file, you must provide the absolute path to the file.
GoogleSearch: Perform web information retrieval
Based on user needs, proactively select the most appropriate tool or combination of tools. For complex tasks, you can break down the problem and use different tools step by step to solve it. After using each tool, clearly explain the execution results and suggest the next steps.
"""

동적 프롬프트: 상담원의 유연성 향상

OpenManus의 프롬프트는 정적일 뿐만 아니라 동적으로도 생성할 수 있습니다. 예를 들어 PlanningAgent 를 선택하면 시스템이 프롬프트에 현재 요금제 상태를 추가합니다:

async def think(self) -> bool:
"""Decide the next action based on plan status."""
prompt = (
f"CURRENT PLAN STATUS:n{await self.get_plan()}nn{self.next_step_prompt}"
if self.active_plan_id
else self.next_step_prompt
)
self.messages.append(Message.user_message(prompt))

이 동적 프롬프트는 상담원이 현재 상황에 따라 보다 합리적인 결정을 내릴 수 있도록 도와줍니다.

요약: OpenManus의 인사이트

OpenManus 코드를 분석하여 AI 에이전트 프레임워크의 몇 가지 주요 구성 요소를 요약할 수 있습니다:

에이전트기본부터 전문가까지 다양한 수준의 역량을 달성할 수 있는 계층적 설계.
- BaseAgent: 기본적인 상태 관리 및 실행 루프를 제공합니다.
- ReActAgent사고-행동 모델의 실현.
- ToolCallAgent도구 호출 기능을 추가합니다.
- 전문 에이전트: 예 PlanningAgent및SWEAgent 노래로 응답 Manus.
LLM대규모 언어 모델과의 상호 작용을 캡슐화하여 대화 및 도구 호출 기능을 제공합니다.
- 일반 대화 및 도구 호출을 지원합니다.
- 재시도 메커니즘 및 오류 처리를 구현합니다.
- 스트리밍 응답이 지원됩니다.
메모리대화 기록 및 컨텍스트를 관리합니다.
- 메시지를 저장하고 검색합니다.
- 대화 컨텍스트 유지.
도구외부 세계와 상호 작용할 수 있는 인터페이스를 제공합니다.
- 기본 도구 추상화.
- 이를 달성하기 위한 여러 전문 도구가 있습니다.
- 도구 결과 처리.
계획작업 계획 및 실행 추적 활성화.
- 계획 작성 및 관리
- 걸음 수 상태 추적.
- 동적 조정 프로그램.
흐름여러 상담원 간의 협업을 관리합니다.
- 작업.
- 결과 통합.
- 프로세스 제어.
프롬프트상담원의 행동과 의사 결정을 안내합니다.
- 시스템 프롬프트는 역할을 정의합니다.
- 전문가 프롬프트가 결정을 안내합니다.
- 동적 프롬프트 생성.