General Introduction
OpenManus-RL was developed by UIUC-Ulab in collaboration with the MetaGPT An open source project jointly developed by the community's OpenManus team and hosted on GitHub, the project improves the reasoning and decision-making capabilities of large language model (LLM) intelligences through reinforcement learning (RL) techniques, exploring new tuning methods based on the experience of models such as Deepseek-R1 and QwQ-32B. The team publicizes progress on a regular basis, with full transparency of code, datasets, and test results, and supports validation of results in benchmarks such as GAIA, AgentBench, WebShop, OSWorld, and others. The project encourages developers around the world to contribute code, datasets, or computational resources to jointly build an efficient ecosystem for smart body development.
So far, building open source Manus With that last shortcoming filled in, MetaGPT really went for it... But... MGX Natural can be covered Manus With all the capabilities, open source reproduction is indeed piggybacking.
Function List
- Intelligent body environment construction: Provides online RL tuning for smart body environment configuration tools.
- Trajectory data collection: Connect models such as Deepseek-R1 and QwQ-32B to collect behavioral data for complex tasks.
- RL Tuning Support: Reinforcement Learning Methods to Support Customized Intelligent Body Behavior.
- Benchmarking Integration: Built-in WebShop, GAIA, OSWorld, AgentBench and other test environments.
- diversification strategy: Integration of RL strategies such as Tree-of-Thoughts, Monte Carlo Tree Search, and others.
- Community collaboration: Submission of code, datasets, etc. is allowed, and significant contributors can become co-authors of the paper.
- Real-time progress sharing: Demonstrate the RL tuning process and results through dynamic updates.
Using Help
Installation process
OpenManus-RL is easy to install and suitable for users with basic Python knowledge. Below are the detailed steps:
1. Creation of the Conda environment
To avoid dependency conflicts, Conda is recommended:
conda create -n openmanus-rl python=3.10
conda activate openmanus-rl
- Prerequisites: Conda needs to be installed and can be downloaded from theAnaconda Official WebsiteDownload.
- After activation, the terminal displays
(openmanus-rl)
The
2. Cloning projects
Make sure Git is installed (check:git --version
If you do not have the installation, you can download it from thegit-scm.(Download):
git clone https://github.com/OpenManus/OpenManus-RL.git
cd OpenManus-RL
- Download the code and go to the project directory.
3. Installation of dependencies
Execute it in the project root directory:
pip install -r requirements.txt
- If the download is slow, use a domestic mirror:
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
- Visualization tools require additional installation:
pip install matplotlib numpy
4. Configuration models and data sets
- Supervised Fine Tuning (SFT): Specify the model and dataset:
python -m openmanus_rl.sft --model_name_or_path Qwen/Qwen2.5-1.5B-Instruct --dataset_name CharlieDreemur/OpenManus-RL
- Reinforcement Learning Tuning (GRPO): Configure the reward function:
python -m openmanus_rl.grpo --model_name_or_path Qwen/Qwen2.5-1.5B-Instruct --dataset_name CharlieDreemur/OpenManus-RL-GRPO --reward_ funcs accuracy format tag_count
- The dataset is available from theHuggingfaceGet.
5. Running the project
- Single GPU running SFT:
python -m openmanus_rl.sft --output_dir data/sft-output
- Multi-GPU running GRPO (configuration required)
zero3.yaml
):
accelerate launch --config_file=configs/accelerate_configs/zero3.yaml openmanus_rl/grpo.py --output_dir data/grpo-output
Main function operation flow
Intelligent body environment construction
- procedure::
- (of a computer) run
python -m openmanus_rl.sft
Generate the base environment. - Modify configuration files (e.g., mission objectives or reward functions).
- fulfillment
python -m openmanus_rl.grpo
Start tuning.
- (of a computer) run
- Usage Scenarios: Customizing an intelligent body environment for a specific task (e.g., shopping decisions).
Data collection and testing
- procedure::
- Configuration model (e.g., Deepseek-R1):
python -m openmanus_rl.grpo --model_name_or_path Deepseek-R1
- Run the test:
--benchmark GAIA
The results are saved todata/
Catalog.
- Usage Scenarios: Analyzing the performance of intelligences in complex tasks.
RL tuning operations
- procedure::
- Run GRPO mode:
python -m openmanus_rl.grpo --reward_funcs accuracy
- View training logs, models saved to
data/grpo-output
The
- Usage Scenarios: Optimizing Intelligent Body Behavior, e.g., Boosting WebShop Purchase Success.
Community Contributions
- procedure::
- Fork the project to a personal GitHub account.
- Local modification and submission:
git add .
git commit -m "Optimize RL strategy"
git push origin main
- Submit a Pull Request, or contact by email
kunlunz2@illinois.edu
The
- Usage Scenarios: Contribute new algorithms or datasets and participate in core development.
Featured Functions
RL Tuning Support
- How it works: Run GRPO, specifying the reward function (e.g.
accuracy
), the training process displays a real-time log, and the model is saved to a specified directory upon completion. - effect: Intelligent bodies can adapt their behavior to tasks, e.g., optimizing multimodal task performance in OSWorld.
Benchmarking Integration
- How it works: Run
python -m openmanus_rl.grpo --benchmark AgentBench
, the system automatically generates reports on success rates, response times, etc. - effect: Provide quantitative metrics to help developers compare model performance.
diversification strategy
- How it works: Select the policy in the configuration file (e.g.
Tree-of-Thoughts
), run the tuning command to test the effect. - effect: Enhancing Intelligentsia's Reasoning Ability in Long-Range Planning Tasks.
OpenManus-RL helps users get started quickly with the above features. The project also provides a community group (see GitHub "Community Group"), where you can join to communicate with developers and get the latest information.