OpenManus-RL: Fine-tuning Large Models to Strengthen Intelligent Body Reasoning and Decision Making

🚀 Invitation to Experience: China's First AI IDE Intelligent Programming Software Trae Chinese version downloadThe DeepSeek-R1 and Doubao-pro are available for unlimited use!

General Introduction

OpenManus-RL was developed by UIUC-Ulab in collaboration with the MetaGPT An open source project jointly developed by the community's OpenManus team and hosted on GitHub, the project improves the reasoning and decision-making capabilities of large language model (LLM) intelligences through reinforcement learning (RL) techniques, exploring new tuning methods based on the experience of models such as Deepseek-R1 and QwQ-32B. The team publicizes progress on a regular basis, with full transparency of code, datasets, and test results, and supports validation of results in benchmarks such as GAIA, AgentBench, WebShop, OSWorld, and others. The project encourages developers around the world to contribute code, datasets, or computational resources to jointly build an efficient ecosystem for smart body development.

So far, building open source Manus With that last shortcoming filled in, MetaGPT really went for it... But... MGX Natural can be covered Manus With all the capabilities, open source reproduction is indeed piggybacking.

OpenManus-RL：微调模型以强化智能体任务完成能力-1

Function List

Intelligent body environment construction: Provides online RL tuning for smart body environment configuration tools.
Trajectory data collection: Connect models such as Deepseek-R1 and QwQ-32B to collect behavioral data for complex tasks.
RL Tuning Support: Reinforcement Learning Methods to Support Customized Intelligent Body Behavior.
Benchmarking Integration: Built-in WebShop, GAIA, OSWorld, AgentBench and other test environments.
diversification strategy: Integration of RL strategies such as Tree-of-Thoughts, Monte Carlo Tree Search, and others.
Community collaboration: Submission of code, datasets, etc. is allowed, and significant contributors can become co-authors of the paper.
Real-time progress sharing: Demonstrate the RL tuning process and results through dynamic updates.

Using Help

Installation process

OpenManus-RL is easy to install and suitable for users with basic Python knowledge. Below are the detailed steps:

1. Creation of the Conda environment

To avoid dependency conflicts, Conda is recommended:

conda create -n openmanus-rl python=3.10  
conda activate openmanus-rl

Prerequisites: Conda needs to be installed and can be downloaded from theAnaconda Official WebsiteDownload.
After activation, the terminal displays(openmanus-rl)The

2. Cloning projects

Make sure Git is installed (check:git --versionIf you do not have the installation, you can download it from thegit-scm.(Download):

git clone https://github.com/OpenManus/OpenManus-RL.git  
cd OpenManus-RL

Download the code and go to the project directory.

3. Installation of dependencies

Execute it in the project root directory:

pip install -r requirements.txt

If the download is slow, use a domestic mirror:

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

Visualization tools require additional installation:

pip install matplotlib numpy

4. Configuration models and data sets

Supervised Fine Tuning (SFT): Specify the model and dataset:

python -m openmanus_rl.sft --model_name_or_path Qwen/Qwen2.5-1.5B-Instruct --dataset_name CharlieDreemur/OpenManus-RL

Reinforcement Learning Tuning (GRPO): Configure the reward function:

python -m openmanus_rl.grpo --model_name_or_path Qwen/Qwen2.5-1.5B-Instruct --dataset_name CharlieDreemur/OpenManus-RL-GRPO --reward_funcs accuracy format tag_count

The dataset is available from theHuggingfaceGet.

5. Running the project

Single GPU running SFT:

python -m openmanus_rl.sft --output_dir data/sft-output

Multi-GPU running GRPO (configuration required)zero3.yaml):

accelerate launch --config_file=configs/accelerate_configs/zero3.yaml openmanus_rl/grpo.py --output_dir data/grpo-output

Main function operation flow

Intelligent body environment construction

procedure::
1. (of a computer) runpython -m openmanus_rl.sftGenerate the base environment.
2. Modify configuration files (e.g., mission objectives or reward functions).
3. fulfillmentpython -m openmanus_rl.grpoStart tuning.
Usage Scenarios: Customizing an intelligent body environment for a specific task (e.g., shopping decisions).

Data collection and testing

procedure::
1. Configuration model (e.g., Deepseek-R1):

python -m openmanus_rl.grpo --model_name_or_path Deepseek-R1

Run the test:--benchmark GAIAThe results are saved todata/Catalog.

Usage Scenarios: Analyzing the performance of intelligences in complex tasks.

RL tuning operations

procedure::
1. Run GRPO mode:

python -m openmanus_rl.grpo --reward_funcs accuracy

View training logs, models saved todata/grpo-outputThe

Usage Scenarios: Optimizing Intelligent Body Behavior, e.g., Boosting WebShop Purchase Success.

Community Contributions

procedure::
1. Fork the project to a personal GitHub account.
2. Local modification and submission:

git add .  
git commit -m "优化RL策略"  
git push origin main

Submit a Pull Request, or contact by emailkunlunz2@illinois.eduThe

Usage Scenarios: Contribute new algorithms or datasets and participate in core development.

Featured Functions

RL Tuning Support

How it works: Run GRPO, specifying the reward function (e.g.accuracy), the training process displays a real-time log, and the model is saved to a specified directory upon completion.
effect: Intelligent bodies can adapt their behavior to tasks, e.g., optimizing multimodal task performance in OSWorld.

Benchmarking Integration

How it works: Runpython -m openmanus_rl.grpo --benchmark AgentBench, the system automatically generates reports on success rates, response times, etc.
effect: Provide quantitative metrics to help developers compare model performance.

diversification strategy

How it works: Select the policy in the configuration file (e.g.Tree-of-Thoughts), run the tuning command to test the effect.
effect: Enhancing Intelligentsia's Reasoning Ability in Long-Range Planning Tasks.

OpenManus-RL helps users get started quickly with the above features. The project also provides a community group (see GitHub "Community Group"), where you can join to communicate with developers and get the latest information.

OpenManus-RL: Fine-tuning Large Models to Enhance Intelligent Body Reasoning and Decision Making