General Introduction
Optexity is an open source project on GitHub, developed by the Optexity team. Its core is to use human demonstration data to train AI to complete computer tasks, especially web page operations. The project includes three code libraries: ComputerGYM, AgentAI and Playwright, which allow users to record operations, process data and train models so that AI can learn tasks such as clicking buttons or filling out forms. All code is free and can be downloaded and modified by users. Self-exploration, software documentation and YouTube video training will be supported in the future.
Function List
- Supports recording of human action demonstrations to train AI for web tasks.
- Provides task environments such as MiniWoB++, including click and form operations.
- Process demo data to generate formats for training.
- Support for Gemini, vLLM and other models, can be fine-tuned with LLaMA-Factory.
- Open source code is available for download for easy customization of features.
- Integrate with Playwright to enhance web automation capabilities.
Using Help
Installation process
To use Optexity, you need to prepare your environment first. Here are the steps:
- Download Code
Enter it in the terminal:
mkdir optexity
cd optexity
git clone https://github.com/Optexity/ComputerGYM.git
git clone https://github.com/Optexity/AgentAI.git
git clone https://github.com/Optexity/playwright.git
This will download three code libraries.
- Configuration environment
Create an environment with Conda:
conda create -n optexity python=3.10 nodejs
conda activate optexity
- Installation of dependencies
Install ComputerGYM and AgentAI:
pip install -e ComputerGYM
pip install -e AgentAI
Install Playwright again:
cd playwright
git checkout playwright_optexity
npm install
npm run build
playwright install
cd ..
Main Functions
Recorded Demo
- establish
demonstration_config.yaml
referencedemonstration_config_example.yaml
Write down the goal of the task (e.g., "click the button"). - Run the recording:
./ComputerGYM/computergym/demonstrations/demonstrate.sh ComputerGYM/computergym/demonstrations/demonstration_config.yaml
The system records your mouse and keyboard actions.
Processing data
Record post-processing data:
python ComputerGYM/computergym/demonstrations/process_demonstration.py --yaml ComputerGYM/computergym/demonstrations/demonstration_config.yaml --seed 5
This will convert the operation to an AI-readable format.
Generate training data
Generate training files with AgentAI:
python AgentAI/agentai/sft/prepare_training_data.py --agent_config AgentAI/agentai/train_configs/hubspot_agent.yaml
The file is saved in the train_data
folder, adapted to LLaMA-Factory.
training model
Trained with LLaMA-Factory, see its documentation. After training the model is deployed in http://localhost:8000
The
Testing AI
Test AI effects, such as changing currencies at HubSpot:
python AgentAI/agentai/main.py --url "https://app.hubspot.com" --port 8000 --log_to_console --goal "change currency to SGD" --storage_state cache_dir/auth.json --model vllm
The result is displayed in the terminal.
Featured Function Operation
Human Demonstration Training
The highlight of Optexity is teaching AI with human actions. you record an action once, and the AI learns to repeat it. It's easy to record and process, so even novices can use it.
Testing the original model
I'd like to try it directly. Gemini Model? Run:
EXPORT GEMINI_API_KEY=<你的密钥>
python AgentAI/agentai/main.py --url "https://app.hubspot.com" --port 8000 --log_to_console --goal "change currency to SGD" --storage_state cache_dir/auth.json --model gemini
The key can be found in the https://aistudio.google.com/apikey
Get it for free.
MiniWoB++ Integration
MiniWoB++ provides tasks such as clicks and forms. At runtime, the AI attempts to complete the goal and the terminal displays the success rate.
Open Source Extensions
All three codebases are open source. You can change the code to add features, like new tasks, or tweak the Playwright logic, and submitting it to GitHub makes it official.
Summary of the operation process
- Install the code base and environment.
- Record presentations and process data.
- Generate training data and train the model.
- Test the AI and adjust the parameters.
The steps are clear and you can get started in a few minutes.
application scenario
- AI Research
Researchers used it to test AI performance on web tasks. - web automation
Developers automate repetitive actions with AI. - Educational Practices
Students use it to learn the AI training process.
QA
- Need a programming foundation?
Requires a bit of Python and terminal knowledge, but the tutorials are detailed and easy to follow. - What is LLaMA-Factory for?
It is the fine-tuning tool that converts demo data to training format. - Do I have to train with a demo?
Not necessary, you can test the original model directly, but demo training works better.