AI Personal Learning
and practical guidance
讯飞绘镜

LangGraph CUA: LangGraph-based AI Intelligence for Controlling Computer Operations

General Introduction

LangGraph CUA is an open source project developed by the LangChain team. It is based on the LangGraph framework, allowing developers to use Python to build AI intelligences that can directly operate computers. The core of this tool is the Computer Use Agent (CUA), which can simulate human behavior on a computer, such as clicking, entering text or browsing the web. It supports memory functions, human-computer collaboration and real-time output, making it suitable for automating repetitive tasks or developing intelligent assistants. The project's code is open for developers to download, modify and use freely, making it particularly suitable for technology enthusiasts interested in AI automation.

 

Function List

  • Supports AI control of computer operations via text and voice, such as opening software, typing text, or clicking buttons.
  • Provides short-term and long-term memory functions to remember previous operations and conversation contents.
  • Built-in human-computer collaboration mode allows the user to step in and adjust the AI's behavior at any time.
  • Support real-time streaming output, the operation process can be displayed step by step.
  • Integration with Scrapybara to run AI agents and access web pages on virtual machines.
  • Allows developers to customize tools and configurations to flexibly extend functionality.

 

Using Help

LangGraph CUA is not complicated to install and use, but requires some basic Python environment and API configuration. Here are the detailed steps to get you started.

Installation process

  1. Preparing the environment
    Make sure your computer has Python 3.8 or above. This can be checked with the command:
python --version

If not, download and install it from https://www.python.org.

  1. cloning project
    Download the code locally by typing the following command in the terminal:
git clone https://github.com/langchain-ai/langgraph-cua-py.git

Once the download is complete, go to the project folder:

cd langgraph-cua-py
  1. Installation of dependencies
    The project requires some Python libraries, which are installed with this command:
pip install -r requirements.txt

If you encounter permission problems, you can add --user::

pip install -r requirements.txt --user
  1. Configuring API Keys
    LangGraph CUA needs API keys for OpenAI and Scrapybara. First register an account to get the key, and then set the environment variables in the terminal:
export OPENAI_API_KEY=<你的OpenAI密钥>
export SCRAPYBARA_API_KEY=<你的Scrapybara密钥>

interchangeability <你的OpenAI密钥> cap (a poem) <你的Scrapybara密钥> Windows users can use the set substitute for exportThe

  1. Verify Installation
    Run a simple test to make sure the environment is OK. Go to the project directory and run it:
python -m langgraph_cua

If no errors are reported, the installation was successful.

How to use the main features

The core of LangGraph CUA is to create an AI agent to operate the computer. Here's how it works.

Creating an AI Agent

Import and configure the agent in a Python file, for example:

from langgraph_cua import create_cua
cua_graph = create_cua()

This will generate a default AI agent. You can add parameters if you want to use a specific VM instance:

cua_graph = create_cua(auth_state_id="你的认证ID")

Operate the computer

The agent can control the computer with commands. For example, tell it to open a browser:

cua_graph.invoke({"command": "open browser"})

Or enter text:

cua_graph.invoke({"command": "type", "text": "你好,世界"})

These commands are executed directly on the computer.

Using the Memory Function

The agent can remember previous actions. For example, let it open Notepad first:

cua_graph.invoke({"command": "open notepad"})

Then enter the content:

cua_graph.invoke({"command": "type", "text": "这是测试"})

The next time it is called, it will know that Notepad is open and continue the operation directly.

human-machine collaboration

If you want to adjust it manually, you can enable the human-machine collaboration mode. Add parameters at runtime:

cua_graph.invoke({"command": "click", "x": 100, "y": 200}, human_in_loop=True)

At this point in the execution, the program will pause and wait for you to confirm or modify the coordinates.

real time output

You can use streaming output if you want to see every step of the operation:

for step in cua_graph.stream({"command": "search web", "query": "天气"}):
print(step)

It will show the search process step by step.

Featured Function Operation

Integrating Scrapybara

Scrapybara enables the agent to run on a virtual machine, suitable for handling web tasks. Configure it to make sure the API key is correct and then run it:

cua_graph.invoke({"command": "browse", "url": "https://example.com"})

The agent will open the web page and operate it in the virtual machine.

Customization Tools

You can add your own tools. For example, define a calculator tool:

def calculator(a, b):
return a + b
cua_graph = create_cua(tools=[calculator])

Then call:

cua_graph.invoke({"command": "calculate", "a": 5, "b": 3})

The result will return 8.


These steps and code will get you up to speed quickly with LangGraph CUA for both simple tasks and complex customizations.

 

application scenario

  1. automated office work
    Use the AI agent to batch process files, such as opening Excel, entering data and saving it, eliminating repetitive operations.
  2. Web Data Crawling
    Let agents visit websites and extract information, such as automatically collecting news headlines or price data.
  3. Intelligent Assistant Development
    Create an assistant that listens to voice commands, such as "open email" or "search for documents," and executes them directly.
  4. Education and training
    Demonstrate how AI can simulate a human operating a computer during instruction to help students understand the principles of automation.

 

QA

  1. Do you need any programming fundamentals?
    Basic Python knowledge is required, such as being able to use the command line and write simple code. If you don't know how, you can learn the basic syntax first.
  2. What if I don't have an API key?
    Go to the official OpenAI (https://openai.com) and Scrapybara websites to register for an account and request a key. Free credits may be limited, so we recommend looking at pricing.
  3. Can I not use a virtual machine?
    Yes, but a virtual machine with Scrapybara is more secure, isolating the operating environment and avoiding impacting the local computer.
  4. Does it support Chinese commands?
    Support. As long as the AI model understands Chinese, typing Chinese commands is just as effective.
May not be reproduced without permission:Chief AI Sharing Circle " LangGraph CUA: LangGraph-based AI Intelligence for Controlling Computer Operations
en_USEnglish