Optexity: an open source project to train AI to perform web operations with human demonstrations

🚀 Invitation to Experience: China's First AI IDE Intelligent Programming Software Trae Chinese version downloadThe DeepSeek-R1 and Doubao-pro are available for unlimited use!

General Introduction

Optexity is an open source project on GitHub, developed by the Optexity team. Its core is to use human demonstration data to train AI to complete computer tasks, especially web page operations. The project includes three code libraries: ComputerGYM, AgentAI and Playwright, which allow users to record operations, process data and train models so that AI can learn tasks such as clicking buttons or filling out forms. All code is free and can be downloaded and modified by users. Self-exploration, software documentation and YouTube video training will be supported in the future.

Optexity: an open-source project to train AI to perform web operations with human demonstrations-1

Function List

Supports recording of human action demonstrations to train AI for web tasks.
Provides task environments such as MiniWoB++, including click and form operations.
Process demo data to generate formats for training.
Support for Gemini, vLLM and other models, can be fine-tuned with LLaMA-Factory.
Open source code is available for download for easy customization of features.
Integrate with Playwright to enhance web automation capabilities.

Using Help

Installation process

To use Optexity, you need to prepare your environment first. Here are the steps:

Download Code
Enter it in the terminal:

mkdir optexity
cd optexity
git clone https://github.com/Optexity/ComputerGYM.git
git clone https://github.com/Optexity/AgentAI.git
git clone https://github.com/Optexity/playwright.git

This will download three code libraries.

Configuration environment
Create an environment with Conda:

conda create -n optexity python=3.10 nodejs
conda activate optexity

Installation of dependencies
Install ComputerGYM and AgentAI:

pip install -e ComputerGYM
pip install -e AgentAI

Install Playwright again:

cd playwright
git checkout playwright_optexity
npm install
npm run build
playwright install
cd ..

Main Functions

Recorded Demo

establish demonstration_config.yamlreference demonstration_config_example.yamlWrite down the goal of the task (e.g., "click the button").
Run the recording:

./ComputerGYM/computergym/demonstrations/demonstrate.sh ComputerGYM/computergym/demonstrations/demonstration_config.yaml

The system records your mouse and keyboard actions.

Processing data

Record post-processing data:

python ComputerGYM/computergym/demonstrations/process_demonstration.py --yaml ComputerGYM/computergym/demonstrations/demonstration_config.yaml --seed 5

This will convert the operation to an AI-readable format.

Generate training data

Generate training files with AgentAI:

python AgentAI/agentai/sft/prepare_training_data.py --agent_config AgentAI/agentai/train_configs/hubspot_agent.yaml

The file is saved in the train_data folder, adapted to LLaMA-Factory.

training model

Trained with LLaMA-Factory, see its documentation. After training the model is deployed in http://localhost:8000The

Testing AI

Test AI effects, such as changing currencies at HubSpot:

python AgentAI/agentai/main.py --url "https://app.hubspot.com" --port 8000 --log_to_console --goal "change currency to SGD" --storage_state cache_dir/auth.json --model vllm

The result is displayed in the terminal.

Featured Function Operation

Human Demonstration Training

The highlight of Optexity is teaching AI with human actions. you record an action once, and the AI learns to repeat it. It's easy to record and process, so even novices can use it.

Testing the original model

I'd like to try it directly. Gemini Model? Run:

EXPORT GEMINI_API_KEY=<你的密钥>
python AgentAI/agentai/main.py --url "https://app.hubspot.com" --port 8000 --log_to_console --goal "change currency to SGD" --storage_state cache_dir/auth.json --model gemini

The key can be found in the https://aistudio.google.com/apikey Get it for free.

MiniWoB++ Integration

MiniWoB++ provides tasks such as clicks and forms. At runtime, the AI attempts to complete the goal and the terminal displays the success rate.

Open Source Extensions

All three codebases are open source. You can change the code to add features, like new tasks, or tweak the Playwright logic, and submitting it to GitHub makes it official.

Summary of the operation process

Install the code base and environment.
Record presentations and process data.
Generate training data and train the model.
Test the AI and adjust the parameters.

The steps are clear and you can get started in a few minutes.

application scenario

AI Research
Researchers used it to test AI performance on web tasks.
web automation
Developers automate repetitive actions with AI.
Educational Practices
Students use it to learn the AI training process.

QA

Need a programming foundation?
Requires a bit of Python and terminal knowledge, but the tutorials are detailed and easy to follow.
What is LLaMA-Factory for?
It is the fine-tuning tool that converts demo data to training format.
Do I have to train with a demo?
Not necessary, you can test the original model directly, but demo training works better.

Optexity: an open-source project to train AI to perform web actions with human demonstrations