General Introduction
TankWork is an open source desktop agent framework designed to enable AI to perceive and control your computer through computer vision and system-level interaction. The framework allows agents to directly control computers through voice and text commands, process real-time screen content, and provide continuous audio-visual feedback and action logs.TankWork is particularly well suited for developers and researchers to help them create autonomous desktop agents capable of truly understanding, analyzing, and interacting with computer interfaces.
Function List
- Direct computer control: Execute operations via voice and text commands
- computer vision analysis: Real-time screen content processing
- voice interaction: Natural Language Processing with ElevenLabs
- Customizable agents: Configuring personalities and skills
- Real-time feedback: Audio visual updates and logging
Using Help
Installation process
- Installation prerequisites::
- Install Anaconda (recommended for dependency management)
- Accessing a terminal/command prompt
- clone warehouse::
git clone https://github.com/AgentTankOS/tankwork.git
cd tankwork
- Installation of dependencies::
pip install --upgrade pip setuptools wheel
pip install -r requirements.txt
- Configuration environment::
- In the project root directory, create the
.env
Documentation:
cp .env.example .env
- Add the API key and settings to the
.env
Documentation:
GEMINI_API_KEY=your_api_key OPENAI_API_KEY=your_api_key ELEVENLABS_API_KEY=your_api_key ANTHROPIC_API_KEY=your_api_key ELEVENLABS_MODEL=eleven_flash_v2_5 COMPUTER_USE_IMPLEMENTATION=tank COMPUTER_USE_MODEL=claude-3-5-sonnet-20241022 COMPUTER_USE_MODEL_PROVIDER=anthropic NARRATIVE_LOGGER_NAME=ComputerUse.Tank NARRATIVE_MODEL=gpt-4o NARRATIVE_TEMPERATURE=0.6 NARRATIVE_MAX_TOKENS=250 LOG_LEVEL=INFO
- In the project root directory, create the
- launch an application::
python main.py
Usage Process
- PC control mode::
- Command-based computer control via text input or voice commands.
- For example, you can say "open browser" or type "open browser" to start the browser.
- computer vision analysis::
- Processes screen content in real time, recognizing and responding to changes on the screen.
- For example, the agent can automatically perform a preset action when a specific image appears on the screen.
- voice interaction::
- Use ElevenLabs' natural language processing capabilities to interact with agents via voice.
- For example, you can ask the agent about the current weather conditions and the agent will reply by voice.
- Customized Agents::
- Configure the agent's personality and skills to meet specific needs.
- For example, you can set the agent to perform a specific task at a specific time, such as opening the mail client at 8:00 a.m. every day.
- Real-time feedback::
- The agent will provide real-time updates and operation logs, both audio and visual, to help the user understand the current operation status.
- For example, when the agent executes a command, it informs the user of the result of the operation by voice.
With these steps, you can easily install and use TankWork to take full advantage of its powerful features to control and manage your computer.