AI Personal Learning
and practical guidance
豆包Marscode1

TankWork: an intelligent body that operates computers via voice and text and provides real-time voice feedback

General Introduction

TankWork is an open source desktop agent framework designed to enable AI to perceive and control your computer through computer vision and system-level interaction. The framework allows agents to directly control computers through voice and text commands, process real-time screen content, and provide continuous audio-visual feedback and action logs.TankWork is particularly well suited for developers and researchers to help them create autonomous desktop agents capable of truly understanding, analyzing, and interacting with computer interfaces.

TankWork:通过语音和文本操作电脑,并提供实时语音反馈的智能体-1


 

Function List

  • Direct computer control: Execute operations via voice and text commands
  • computer vision analysis: Real-time screen content processing
  • voice interaction: Natural Language Processing with ElevenLabs
  • Customizable agents: Configuring personalities and skills
  • Real-time feedback: Audio visual updates and logging

 

Using Help

Installation process

  1. Installation prerequisites::
    • Install Anaconda (recommended for dependency management)
    • Accessing a terminal/command prompt
  2. clone warehouse::
   git clone https://github.com/AgentTankOS/tankwork.git
cd tankwork
  1. Installation of dependencies::
   pip install --upgrade pip setuptools wheel
pip install -r requirements.txt
  1. Configuration environment::
    • In the project root directory, create the.envDocumentation:
     cp .env.example .env
    
    • Add the API key and settings to the.envDocumentation:
     GEMINI_API_KEY=your_api_key
    OPENAI_API_KEY=your_api_key
    ELEVENLABS_API_KEY=your_api_key
    ANTHROPIC_API_KEY=your_api_key
    ELEVENLABS_MODEL=eleven_flash_v2_5
    COMPUTER_USE_IMPLEMENTATION=tank
    COMPUTER_USE_MODEL=claude-3-5-sonnet-20241022
    COMPUTER_USE_MODEL_PROVIDER=anthropic
    NARRATIVE_LOGGER_NAME=ComputerUse.Tank
    NARRATIVE_MODEL=gpt-4o
    NARRATIVE_TEMPERATURE=0.6
    NARRATIVE_MAX_TOKENS=250
    LOG_LEVEL=INFO
    
  2. launch an application::
   python main.py

Usage Process

  1. PC control mode::
    • Command-based computer control via text input or voice commands.
    • For example, you can say "open browser" or type "open browser" to start the browser.
  2. computer vision analysis::
    • Processes screen content in real time, recognizing and responding to changes on the screen.
    • For example, the agent can automatically perform a preset action when a specific image appears on the screen.
  3. voice interaction::
    • Use ElevenLabs' natural language processing capabilities to interact with agents via voice.
    • For example, you can ask the agent about the current weather conditions and the agent will reply by voice.
  4. Customized Agents::
    • Configure the agent's personality and skills to meet specific needs.
    • For example, you can set the agent to perform a specific task at a specific time, such as opening the mail client at 8:00 a.m. every day.
  5. Real-time feedback::
    • The agent will provide real-time updates and operation logs, both audio and visual, to help the user understand the current operation status.
    • For example, when the agent executes a command, it informs the user of the result of the operation by voice.

With these steps, you can easily install and use TankWork to take full advantage of its powerful features to control and manage your computer.

May not be reproduced without permission:Chief AI Sharing Circle " TankWork: an intelligent body that operates computers via voice and text and provides real-time voice feedback
en_USEnglish