OWL: An automated tool for multi-intelligence collaboration on realistic tasks

Latest AI Resources5mos agoupdate AI Sharing Circle

1.5K 00

General Introduction

OWL (Optimized Workforce Learning) is an open source framework developed by the CAMEL-AI team focused on optimizing multi-intelligent body collaboration for automating real-world tasks. Based on the CAMEL-AI architecture, OWL improves the naturalness, efficiency and robustness of task processing through dynamic intelligent body interactions. In the GAIA benchmark test, OWL achieved an average score of 58.18, ranking first in open source frameworks. The project is officially open-sourced on March 7, 2025, and the code is hosted on GitHub (https://github.com/camel-ai/owl), which provides detailed documentation and examples, aiming to promote the integration of AI research and real-world applications for both academic exploration and task automation scenarios.

The saddest thing about the Chinese-speaking community is that as a source of information, they never introduce CAMEL-AI and the AGENTGPT Instead, they are interested in something like Manus The OWL is very interesting. Some products commercialization will promote technological advancement, some will not.

Function List

Real-time information retrieval: Supports access to up-to-date information via Wikipedia, Google Search, and other online resources.
multimodal processing: Ability to process video, picture and audio data over the network or locally.
Browser Automation: Based on the Playwright framework, it supports simulation of browser actions such as scrolling, clicking, typing and downloading.
document resolution: Extract Word, Excel, PDF and PowerPoint file contents and convert to text or Markdown format.
code execution: Support for writing and running Python code to accomplish tasks through the interpreter.
Multi-Intelligence Collaboration: Multiple AI intelligences interact dynamically to collaborate on complex tasks.

Using Help

Installation process

OWL is an open source project, users need to download the source code from GitHub and configure the runtime environment. The following are the detailed installation steps:

clone warehouse
Enter the following command in the terminal to get the OWL source code:

git clone https://github.com/camel-ai/owl.git
cd owl

Setting up the environment

Recommended Conda::

conda create -n owl python=3.11
conda activate owl

Alternatives use venv::
```
python -m venv owl_env
```
- Windows system activation:
```
owl_env\Scripts\activate
```
- Unix or MacOS system activation:
```
source owl_env/bin/activate
```

Installation of dependencies
After activating the environment, run the following command to install the dependencies:

python -m pip install -r requirements.txt
playwright install

Notes:playwright installUsed to install components required for browser automation.

Configuring Environment Variables
OWL needs to configure API keys to use external services (e.g. OpenAI models). The steps to do this are as follows:

Copy the template file:
```
cp .env_template .env
```
compiler.envfile, fill in the API key, for example:
```
OPENAI_API_KEY=your_openai_key
```
Guidelines for obtaining the key: refer toowl/.env_templateThe service registration URL listed in the
Additional model support: available in the CAMEL model documentation (https://docs.camel-ai.org/key_modules/models.html).
take note of: It is officially recommended to use OpenAI models for best performance, other models may perform poorly in complex tasks.

Verify Installation
Run the following command to test the environment:

python owl/run.py

If the console outputs a normal message, the installation was successful.

Main function operation flow

1. Examples of operating bases

OWL provides a minimalist example scriptrun.py, run it directly to experience it:

Enter it in the terminal:

python owl/run.py

Output: The console will display the results of running the default task.

2. Customized mandates

Users can modify therun.pyScripts to run customized tasks:

Editing Scripts: Openrun.py, modify the task description, for example:

question = "查询苹果公司最新的股票价格。"
society = construct_society(question)
answer, chat_history, token_count = run_society(society)
logger.success(f"Answer: {answer}")

Running Scripts::
```
python owl/run.py
```
Results View: The console will output stock price information.
Other sample tasks::
- "Analyzing the Sentiment of Recent Tweets on Climate Change."
- "Help me debug this Python code: [code content]"
- "Summarize the main points of this research paper:[Paper URL]."

3. Browser automation

OWL supports browser interaction via Playwright, such as crawling web pages:

Sample Script: Create a file (e.g.web_task.py):

from owl.agents import BrowserAgent
agent = BrowserAgent()
agent.navigate("https://example.com")
content = agent.get_content()
print(content)

Running Scripts::
```
python web_task.py
```
in the end: Outputs the text content of a web page.
Supported Operations: scrolling, clicking, typing, downloading, etc. Refer to the official documentation for specific APIs.

4. Document parsing and multimodal processing

parse a document: Place a local file (e.g.sample.pdf(computing) put (sth) into (the)owldirectory, run the following code:
```
from owl.utils import parse_document
text = parse_document("sample.pdf")
print(text)
```

Processing Video: Support for analyzing local or network video, for example:

from owl.multimodal import process_video
result = process_video("https://example.com/video.mp4")
print(result)

Featured Function Operation

Real-time information retrieval

procedure: Specify the source of information in the task description, for example:

question = "从Wikipedia获取人工智能的最新定义。"
society = construct_society(question)
answer, chat_history, token_count = run_society(society)
print(answer)

in the end: Return to the latest content on Wikipedia.

GAIA Benchmark Replication

operational test: Reproduce the GAIA results using the provided script:
```
python run_gaia_roleplaying.py
```
Results View: Output the scores for each task to verify the performance of OWL in the benchmark test (mean score 58.18).

Precautions for use

The system needs to have Git and Python 3.11+ installed.
When running large-scale tasks, it is recommended to use high-performance equipment and ensure network stability.
If the Chrome window is blank but there is output from the console, this is normal, and the window is only activated when the task requires browser interaction.