AI Personal Learning
and practical guidance
Beanbag Marscode1

OpenManus-RL: Fine-tuning Large Models to Enhance Intelligent Body Reasoning and Decision Making

Post was updated on 2025-03-10 18:56, Part of the content is time-sensitive, please leave a message if it is not working!

General Introduction

OpenManus-RL was developed by UIUC-Ulab in collaboration with the MetaGPT An open source project jointly developed by the community's OpenManus team and hosted on GitHub, the project improves the reasoning and decision-making capabilities of large language model (LLM) intelligences through reinforcement learning (RL) techniques, exploring new tuning methods based on the experience of models such as Deepseek-R1 and QwQ-32B. The team publicizes progress on a regular basis, with full transparency of code, datasets, and test results, and supports validation of results in benchmarks such as GAIA, AgentBench, WebShop, OSWorld, and others. The project encourages developers around the world to contribute code, datasets, or computational resources to jointly build an efficient ecosystem for smart body development.

So far, building open source Manus With that last shortcoming filled in, MetaGPT really went for it... But... MGX Natural can be covered Manus With all the capabilities, open source reproduction is indeed piggybacking.

OpenManus-RL: Fine-tuning Models to Enhance Intelligent Body Task Completion-1

 

Function List

  • Intelligent body environment construction: Provides online RL tuning for smart body environment configuration tools.
  • Trajectory data collection: Connect models such as Deepseek-R1 and QwQ-32B to collect behavioral data for complex tasks.
  • RL Tuning Support: Reinforcement Learning Methods to Support Customized Intelligent Body Behavior.
  • Benchmarking Integration: Built-in WebShop, GAIA, OSWorld, AgentBench and other test environments.
  • diversification strategy: Integration of RL strategies such as Tree-of-Thoughts, Monte Carlo Tree Search, and others.
  • Community collaboration: Submission of code, datasets, etc. is allowed, and significant contributors can become co-authors of the paper.
  • Real-time progress sharing: Demonstrate the RL tuning process and results through dynamic updates.

Using Help

Installation process

OpenManus-RL is easy to install and suitable for users with basic Python knowledge. Below are the detailed steps:

1. Creation of the Conda environment

To avoid dependency conflicts, Conda is recommended:

conda create -n openmanus-rl python=3.10
conda activate openmanus-rl
  • Prerequisites: Conda needs to be installed and can be downloaded from theAnaconda Official WebsiteDownload.
  • After activation, the terminal displays(openmanus-rl)The

2. Cloning projects

Make sure Git is installed (check:git --versionIf you do not have the installation, you can download it from thegit-scm.(Download):

git clone https://github.com/OpenManus/OpenManus-RL.git
cd OpenManus-RL
  • Download the code and go to the project directory.

3. Installation of dependencies

Execute it in the project root directory:

pip install -r requirements.txt
  • If the download is slow, use a domestic mirror:
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
  • Visualization tools require additional installation:
pip install matplotlib numpy

4. Configuration models and data sets

  • Supervised Fine Tuning (SFT): Specify the model and dataset:
python -m openmanus_rl.sft --model_name_or_path Qwen/Qwen2.5-1.5B-Instruct --dataset_name CharlieDreemur/OpenManus-RL
  • Reinforcement Learning Tuning (GRPO): Configure the reward function:
python -m openmanus_rl.grpo --model_name_or_path Qwen/Qwen2.5-1.5B-Instruct --dataset_name CharlieDreemur/OpenManus-RL-GRPO --reward_ funcs accuracy format tag_count

5. Running the project

  • Single GPU running SFT:
python -m openmanus_rl.sft --output_dir data/sft-output
  • Multi-GPU running GRPO (configuration required)zero3.yaml):
accelerate launch --config_file=configs/accelerate_configs/zero3.yaml openmanus_rl/grpo.py --output_dir data/grpo-output

Main function operation flow

Intelligent body environment construction

  • procedure::
    1. (of a computer) runpython -m openmanus_rl.sftGenerate the base environment.
    2. Modify configuration files (e.g., mission objectives or reward functions).
    3. fulfillmentpython -m openmanus_rl.grpoStart tuning.
  • Usage Scenarios: Customizing an intelligent body environment for a specific task (e.g., shopping decisions).

Data collection and testing

  • procedure::
    1. Configuration model (e.g., Deepseek-R1):
python -m openmanus_rl.grpo --model_name_or_path Deepseek-R1
  1. Run the test:--benchmark GAIAThe results are saved todata/Catalog.
  • Usage Scenarios: Analyzing the performance of intelligences in complex tasks.

RL tuning operations

  • procedure::
    1. Run GRPO mode:
python -m openmanus_rl.grpo --reward_funcs accuracy
  1. View training logs, models saved todata/grpo-outputThe
  • Usage Scenarios: Optimizing Intelligent Body Behavior, e.g., Boosting WebShop Purchase Success.

Community Contributions

  • procedure::
    1. Fork the project to a personal GitHub account.
    2. Local modification and submission:
git add .
git commit -m "Optimize RL strategy"
git push origin main
  1. Submit a Pull Request, or contact by emailkunlunz2@illinois.eduThe
  • Usage Scenarios: Contribute new algorithms or datasets and participate in core development.

Featured Functions

RL Tuning Support

  • How it works: Run GRPO, specifying the reward function (e.g.accuracy), the training process displays a real-time log, and the model is saved to a specified directory upon completion.
  • effect: Intelligent bodies can adapt their behavior to tasks, e.g., optimizing multimodal task performance in OSWorld.

Benchmarking Integration

  • How it works: Runpython -m openmanus_rl.grpo --benchmark AgentBench, the system automatically generates reports on success rates, response times, etc.
  • effect: Provide quantitative metrics to help developers compare model performance.

diversification strategy

  • How it works: Select the policy in the configuration file (e.g.Tree-of-Thoughts), run the tuning command to test the effect.
  • effect: Enhancing Intelligentsia's Reasoning Ability in Long-Range Planning Tasks.

OpenManus-RL helps users get started quickly with the above features. The project also provides a community group (see GitHub "Community Group"), where you can join to communicate with developers and get the latest information.


CDN1
May not be reproduced without permission:Chief AI Sharing Circle " OpenManus-RL: Fine-tuning Large Models to Enhance Intelligent Body Reasoning and Decision Making

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish