Desktop Automation Intelligence

Total 44 articles posts
Agent S:像人类一样操作电脑的开源智能体框架

Agent S: An Open Source Framework for Intelligent Bodies to Operate Computers Like Humans

General Introduction Agent S is an open-source framework developed by Simular AI that lets intelligences operate computers like humans through a graphical user interface (GUI). It uses a multimodal large language model and empirical learning techniques to accomplish tasks such as browsing the web, editing documents, using software...
4mos ago
01.1K
RunRabbit:使用语音和文字操作智能体完成电脑操作

RunRabbit: Using Voice and Text to Operate Intelligent Bodies to Complete Computer Operations

General Introduction RunRabbit is an artificial intelligence-based tool that allows users to control their browsers to accomplish various tasks through simple voice or text commands. Its best feature is that it understands the user's needs and then automatically manipulates web pages, such as searching for information, filling out forms or performing repetitive tasks...
4mos ago
0966
Agent TARS:使用视觉和命令操作电脑的开源智能体

Agent TARS: An Open Source Intelligence Using Vision and Commands to Operate Computers

Comprehensive Introduction Agent TARS is a multimodal AI intelligence open-sourced by ByteDance.The core feature is to visually understand web content and combine command line and file system operations to help users complete complex computer tasks. Instead of requiring manual operations like traditional tools, it can self...
5mos ago
01.3K
TankWork:通过语音和文字操作电脑,并提供实时语音反馈的智能体

TankWork: an intelligent body that operates computers via voice and text and provides real-time voice feedback

General Introduction TankWork is an open source desktop agent framework designed to enable AI to perceive and control your computer through computer vision and system-level interaction. The framework allows agents to directly control computers through voice and text commands, process real-time screen content, and provide continuous audio visual...
7mos ago
01.4K
Browser Use Web UI:运行AI智能体浏览网页,让AI能够自动操作网页的开源框架

Browser Use Web UI: an open source framework for running AI intelligences to browse the web, allowing AI to automatically manipulate web pages

Comprehensive Introduction Browser Use Web UI is an innovative open source project focused on providing AI agents with a graphical interface tool for browser interaction capabilities. The project is built on top of the browser-use core framework, built with Gradio ...
2mos ago
02.7K
NeoAI:让AI接管电脑远程操作,使用自然语言控制电脑的开源项目

NeoAI: Open source project that lets AI take over remote operation of computers and control them using natural language

General Introduction NeoAI is an innovative open source AI assistant tool that allows users to easily control and manage their computers through natural language conversations. Without writing any code, users can simply use everyday conversations to find files, automate tasks, manage devices, etc.NeoAI...
7mos ago
02.5K
CogAgent:智谱开源的智能视觉语言模型,实现图形界面自动化操作

CogAgent: Smart Spectrum's open source intelligent visual language model for automating graphical interfaces

Comprehensive Introduction CogAgent is an open source visual language model developed by Tsinghua University Data Mining Research Group (THUDM), aiming to automate the operation of cross-platform graphical user interface (GUI). The model is based on CogVLM (GLM-4V-9B) and supports bilingual Chinese and English...
8mos ago
01.8K
Browser-Use:构建智能网页自动化工具,让AI智能体轻松操作浏览器

Browser-Use: Building Intelligent Web Automation Tools for AI Intelligents to Easily Operate Browsers

Comprehensive Introduction Browser-Use is an innovative open source web automation tool specifically designed to enable Language Models (LLMs) to naturally interact with websites. It provides a powerful and flexible framework that supports a wide range of mainstream language models, including GPT-4, Claud...
8mos ago
02.6K
GLM-PC(智谱牛牛)正式发布内测下载,真正可以控制电脑的AI

GLM-PC (Smart Spectrum Bull) officially released for internal download, the real AI that can control the computer

GLM-PC (Bull) Introduction GLM-PC is a desktop application based on the CogAgent model, which is able to perform complex tasks quickly through natural language commands. It has the ability of task planning and interface understanding, and can autonomously complete various computer operations according to user instructions. Notes for use...
8mos ago
02.3K
AppAgent:利用多模态智能体自动操作智能手机

AppAgent: automated smartphone operation using multimodal intelligences

Comprehensive Introduction AppAgent is a large language model (LLM)-based multimodal agent framework designed to manipulate smartphone applications. The framework mimics human interactions such as taps and swipes through a simplified manipulation space, thus eliminating the need for system back-end access and extending its use across different app...
8mos ago
02K