AI Personal Learning
and practical guidance
讯飞绘镜

Agent TARS: An Open Source Intelligence Using Vision and Commands to Operate Computers

General Introduction

Agent TARS is a multimodal AI intelligence open-sourced by ByteDance, with core features that help users complete complex computer tasks by visually understanding web content and combining command line and file system operations. Instead of requiring manual operations like traditional tools, it can automate browser tasks, edit files, or run commands. The website offers desktop application downloads and technical documentation for developers or users who need to automate their workflow. It is currently in a technical preview phase and primarily supports macOS. Agent TARS aims to make computer operations smarter and more efficient. The project is based on UI-TARS Desktop Browser wrapping, benchmarking Manus The

Agent TARS:使用视觉和命令操作电脑的开源智能体-1


 

Function List

  • Browser Automation: Automate searching, clicking, form filling, etc. by visually recognizing web page elements.
  • Command Line Integration: Supports running system commands directly to execute scripts or manage background tasks.
  • file system operation: The ability to read, edit or generate files, process data or save results.
  • Mission planning and implementation: Break down complex tasks and automate step-by-step completion to support in-depth research or repetitive work.
  • multimodal interaction: Combine image, text and code input to adapt to different types of tasks.
  • Tool Extension: Integrate search, document editing, and Model Context Protocol (MCP) to enhance functional flexibility.
  • Desktop Application Support: Provide an interface to show the operation process, which is convenient for users to view and adjust in real time.

 

Using Help

The use of Agent TARS is divided into two parts: installation and operation. Below are the detailed steps to get you started quickly.

Installation process

  1. Download Desktop Applications
    Open the official website https://agent-tars.com/ and click the "Download" button to go to the GitHub release page (https://github.com/bytedance/UI-TARS-desktop/). releases). Select the latest version (e.g. AgentTARS-macOS-latest.dmg) download. The file size is about several tens of MB, and it takes 1-5 minutes depending on the network speed.
  2. Installation to macOS
    Once the download is complete, double-click .dmg file, the installation window will pop up. Drag the Agent TARS icon to the Applications folder. The installation process will only take a few seconds. Once complete, find Agent TARS in Applications and click Open.
  3. Setting Up Permissions
    The first time you start macOS, you will be prompted to grant access to Accessibility. Click on "System Settings > Privacy & Security > Accessibility", find Agent TARS and turn it on. This is to allow it to control the screen and keyboard.
  4. Configuration Models and APIs
    After opening the app, click the Settings button in the lower left corner to enter the configuration page. You need to set the model provider (e.g. Azure OpenAI) and API key. Specific steps:

    • Select the provider in the "Model Config".
    • Enter your API key (you need to get it from the provider yourself).
    • If you're using Azure, you'll also need to fill in the apiVersion,deploymentName cap (a poem) endpointThe
      After saving, the app automatically connects to the model.
  5. Optional Search Configuration
    If you need the web search function, go to "Search Config", select the search provider and enter the API key. Save when done.

workflow

Once installed, Agent TARS has a simple main interface with input boxes and action display areas. The following is the usage of the main functions.

Browser Automation

  • move: Enter a task in the input box, such as "Search for the latest AI news and save the headlines". Press enter and Agent TARS will open the built-in browser to automatically search for and extract headlines.
  • demonstrate: The right window displays browser actions in real time, such as opening web pages and scrolling pages.
  • in the end: When finished, it will save the title as a text file with the path displayed at the bottom of the interface.

Command Line Integration

  • move: Enter a command such as "List files in current folder" (under macOS it is ls -l Equivalent commands for the dir). Press the Enter key and Agent TARS calls the terminal to execute.
  • demonstrate: The command output appears at the bottom of the interface for easy viewing.
  • Advanced Usage: You can enter complex scripts, such as "check system memory and log", and it will run the corresponding command and save the result.

file system operation

  • move: Type "Create a new file test.txt and write 'hello'". Press the Enter key and Agent TARS creates the file and writes the contents.
  • demonstrate: The operation process will be displayed in the interface, and you can click the path to view the file after completion.
  • Edit file: Type "open test.txt and add 'world'" and it will modify the file automatically.

Mission planning and implementation

  • move: Enter a complex task, such as "Research features of the latest version of Python and organize documentation." Agent TARS breaks down the task: searching for data, extracting information, and generating documentation.
  • demonstrate: The right window shows each step of the operation, such as opening a web page and copying text.
  • in the end: Eventually generate an organized document and save it to the specified path.

human-machine collaboration

  • Real-time adjustments: During task execution, you can add commands to the input box, such as "add another example paragraph". agent TARS will adjust its operation according to the new input.
  • Share the resultsClick the "Share" button and select "Local HTML" to generate a log file, or configure a remote server URL for uploading and sharing.

caveat

  • Environmental requirements: Currently only macOS is supported, Windows and Linux versions have not yet been released.
  • network connection: A stable network is needed to connect models and search services.
  • adjust components during testing: If the function doesn't work (e.g. search fails), check if the API key is correct, or join the Discord community for help (link on the official website).

With these steps, you can easily use Agent TARS for everything from simple file manipulation to complex research tasks.

 

application scenario

  1. web automation
    Use Agent TARS to automatically browse the web and extract news or product information. For example, type in "collect recent tech news headlines" and it will search and save the results for market research or information organization.
  2. task management
    Plan complex projects, such as "make travel plans", it searches for flights, hotels and organizes them into documents. Ideal for personal assistants or project management.
  3. Code Assist
    Enter "Generate Python script to check file size" and Agent TARS will write and save the code, making it easy for developers to quickly generate tools.
  4. data analysis
    Handles real-time data, such as "Analyze stock data on a web page and save a table". It extracts data and generates files, suitable for financial or market analysis.

 

QA

  1. Is Agent TARS free?
    Yes, it is an open source project and follows the Apache 2.0 license. The code and application are free to download and use from GitHub.
  2. Does it support Windows?
    Currently only macOS is supported, Windows and Linux versions are still in development, so keep an eye on GitHub for updates.
  3. Programming knowledge required?
    No need. It operates in natural language and is accessible to the average user. But knowing how to program can make better use of the command line functionality.
  4. How do I fix the search function not working?
    Check that the API key in the "Search Config" is correct, or that the network connection is working. You can also join the Discord community to give feedback.
May not be reproduced without permission:Chief AI Sharing Circle " Agent TARS: An Open Source Intelligence Using Vision and Commands to Operate Computers
en_USEnglish