AI Personal Learning
and practical guidance

UI-TARS Desktop: Desktop Intelligentsia Application for Controlling Computers Using Natural Language

General Introduction

UI-TARS Desktop is a graphical interface agent application based on UI-TARS (Visual Language Model) developed by ByteDance. The application allows users to control computers through natural language for more intuitive and efficient human-computer interactions.UI-TARS Desktop supports cross-platform operation, is compatible with both Windows and macOS systems, and provides real-time feedback and status displays. Users can complete operations such as screenshots, visual recognition, and precise mouse and keyboard control through simple voice commands, greatly enhancing the convenience and intelligence of computer operations.

UI-TARS Desktop: Desktop Intelligentsia Application for Computer Control Using Natural Language-1


 

Function List

  • Natural language control: control of computer operations through voice commands
  • Screenshot and Visual Recognition: Supports screenshot and image recognition functions
  • Precise Mouse and Keyboard Control: Enables high-precision mouse and keyboard operation
  • Cross-platform support: compatible with Windows and macOS systems
  • Real-time feedback and status display: Provides real-time feedback and status updates on operations

 

Using Help

Installation process

MacOS

  1. Download the latest version of the UI-TARS Desktop application.
  2. Drag the UI-TARS application to the Applications folder.
  3. Enable UI-TARS permissions in macOS system settings:
    • System Settings -> Privacy & Security -> Accessibility
    • System Settings -> Privacy & Security -> Screen Recording
  4. Open the UI-TARS application, which can be used in the terminal if the application is damaged sudo xattr -dr com.apple.quarantine /Applications/UI\ TARS.app The Fix.

Windows (computer)

  1. Download the latest version of the UI-TARS Desktop application.
  2. Run the application and follow the prompts to complete the installation.

Guidelines for use

  1. After opening the UI-TARS application, the user is presented with the main interface.
  2. In the main interface, users can perform various operations through voice commands, such as getting weather information and sending tweets.
  3. The application supports Visual Language Models (VLMs) deployed by HuggingFace (in the cloud) and Ollama (locally), and it is recommended to use the HuggingFace inference endpoint for rapid deployment.
  4. Users can refer to the provided GUI model deployment guide for model deployment.

Main function operation flow

natural language control

  1. In the main interface, tap the microphone icon to start voice input.
  2. Say commands, such as "Open your browser and search for weather."
  3. The application will perform the corresponding operation according to the instruction and display the result on the interface.

Screenshots and Visual Recognition

  1. In the main interface, select the "Screenshot" function.
  2. Use the mouse to select the area you want to take a screenshot of.
  3. The app will automatically recognize the content of the screenshot and display the recognition result.

Precise mouse and keyboard control

  1. In the main interface, select "Mouse Control" or "Keyboard Control" function.
  2. Use voice commands or manually enter commands to control mouse movement and keyboard input.
  3. The application will perform the appropriate actions according to the instructions and provide real-time feedback.
May not be reproduced without permission:Chief AI Sharing Circle " UI-TARS Desktop: Desktop Intelligentsia Application for Controlling Computers Using Natural Language

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish