Skyvern: Automating Browser-Based Workflows with LLM and Computer Vision

🚀 Invitation to Experience: China's First AI IDE Intelligent Programming Software Trae Chinese version downloadThe DeepSeek-R1 and Doubao-pro are available for unlimited use!

General Introduction

Skyvern is a tool for automating browser workflows using Large Language Modeling (LLM) and computer vision techniques. It efficiently automates manual actions on a large number of websites by providing a simple API endpoint that can replace automation solutions that are fragile or unreliable.Skyvern operates on websites that have never been seen before, automatically mapping visual elements to the actions needed to complete the workflow without any custom code.

Skyvern is a bit like BabyAGI and AutoGPT with visualization capabilities, where multiple agents fully automate the process of thinking about and acting on task objectives.

Skyvern Online Experience: https://www.skyvern.com/

Skyvern Feature List

Automating Browser Workflows: Automate various tasks in the browser through LLM and computer vision techniques.
API endpoint: Provides a simple API interface for easy integration and calling.
No custom code required: No need to write custom scripts for each site, adaptable.
Resistant to web layout changes: Does not rely on a fixed XPath or selector, and is able to cope with changes in page layout.
large scale application: The ability to apply a single workflow to multiple sites.
intelligent interaction: Reasoning using LLM to handle complex interaction scenarios.

Using Help

Installation process

environmental preparation::
- Make sure Python 3.11 and above is installed.
- Install the Poetry dependency management tool.
- Install the PostgreSQL database.
- Install the Node.js environment.
- Support Docker one-click deployment

clone source code::

git clone https://github.com/skyvern-ai/skyvern.git
cd skyvern

Installation of dependencies::
```
./setup.sh
```
Configuring Environment Variables::
- modifications .env file, fill in the required API key and configuration parameters.
Starting services::
```
docker-compose up -d
```

visualization task

How to understand the information displayed by Skyvern

Visualization of results

Skyvern comes with a visualization tool to help you understand how your tasks are performing. First, navigate to the Task History page and click on any task to view it.

Skyvern: Automating Browser-Based Workflows with LLM and Computer Vision-1

manipulate

Each action performed by Skyvern can be viewed in the Action Viewer and is accompanied by a screenshot of the screen state after the action is performed.

Skyvern: Automating Browser-Based Workflows with LLM and Computer Vision-2

record (video or audio)

Each Skyvern task contains a recording of the entire operation (end-to-end). To view the recording, click the Recordings tab.

Skyvern: Automating Browser-Based Workflows with LLM and Computer Vision-3

Mission parameters

Task parameters are the inputs you provide to Skyvern, which include URLs, extraction rules, and any other relevant information.

Skyvern: Automating Browser-Based Workflows with LLM and Computer Vision-4

Diagnostic Log

The Diagnostics tab contains information that Skyvern uses for processing, including annotated screenshots, action screenshots, element trees, hints, action lists, page HTML, and raw Large Language Model (LLM) requests.

Skyvern: Automating Browser-Based Workflows with LLM and Computer Vision-5

workflow

workflow - Linking multiple tasks together

A workflow represents linking multiple chunks of data together. Imagine invoking multiple tasks in succession, performing conditional logic, extracting data to CSV, etc. All these ideas will be supported in our workflow functionality.

All of our workflows begin with YAML format definitions, but the new version provides a graphical interface that allows multiple components to be linked together to produce some defined output.

Skyvern: Automating Browser-Based Workflows with LLM and Computer Vision-1

Supported Modules

TaskBlock: The magic Skyvern navigates through the websites to take actions and/or extract information.
ForLoopBlock
CodeBlock
TextPromptBlock
DownloadToS3Block
UploadToS3Block
SendEmailBlock
FileParserBlock

Task block inputs

URL (usually required). The starting point for the Skyvern Agent is, ideally, the target website you wish to automate.
- In the workflow screen, if this input is left blank, it will continue where the previous node stopped. The purpose of the navigation target is to set or reset the starting point of the agent.
- If you logged in to a site in the first task block, you may want to leave the URL blank in the second block to continue after you logged in
Navigational Objectives (usually required). A detailed description of where Skyvern is navigated and what actions are performed. A clear navigation goal should be a single objective broken down into steps. Avoid providing multiple goals. You need to use "COMPLETE" to specify goal completion, or "TERMINATE" to abort the goal.
- The navigation target is not used for loading URLs; requiring Skyvern to "Visit Site A" in this field will not have the desired effect!
- Terminating the operation will cause Skyvern to explain why it stopped navigation
- You can omit this field if you only want Skyvern to extract data without navigating to another location
Data Extraction Objectives (Optional). Is there any data that Skyvern extracts and returns other than Skyvern's navigational position and actions? Good data extraction objectives should specify what data Skyvern returns to the user
- Please note that data extraction only takes place after Skyvern has completed navigation!
Extraction of information model (Optional). If you have data extraction goals, some users may need the data in a specific format for internal purposes. The navigation load accepts the JSON format specification for formatting data returns
Maximum number of steps covered (Optional). Some users want to limit costs by the number of steps in a task
Maximum number of retries (Optional). Number of retries allowed if a step fails
Download complete and you're done (Optional). Allow Skyvern to complete tasks after the file has been downloaded
File Suffix (Optional). Identifiers attached to downloaded files
TOTP URLs and TOTP identifiers (Optional). If you have an internal system that can store TOTP code for 2FA, this URL calls that storage. The identifier associates the code with the task, which is important if you are running multiple tasks at the same time. If you want to set up 2FA acquisition in a workflow, the Please contact usThe
parameters (Optional). Parameters are custom placeholders that specify the run. They can be workflow parameters, passed in via an API call, or output parameters taken from a previous task block. If specified, these parameters will be used by Skyvern to aid navigation, fill out forms, or further influence actions on the site.

Task API Usage Flow (Example)

The full documentation of the Tasks API is available at

Creating Tasks::

Create a task through an API endpoint, specifying the target URL and the operation target.

Example Request:

{
"url": "https://example.com",
"navigation_goal": "填写表单并提交",
"data_extraction_goal": "提取提交后的确认信息"
}

Monitoring Tasks::
- Use the real-time monitoring feature provided by Skyvern to see how tasks are performing.
- Access via browser http://localhost:8080 View real-time operations.
data extraction::
- Specify the data extraction mode and format, and Skyvern will automatically extract and return the data.
- Example Request:
```
{
"url": "https://example.com/data",
"data_extraction_schema": {
"name": "string",
"email": "string",
"phone": "string"
}
}
```
File Download::
- Specify the file download target and Skyvern will automatically download the file and provide the download link.
- Example Request:
```
{
"url": "https://example.com/files",
"file_download_goal": "下载所有PDF文件"
}
```

common problems

How is authentication handled? Skyvern supports multiple authentication methods, including password manager integration and multi-factor authentication (2FA). When creating a task, the navigation_payload Provide identification information.
How do you handle complex multi-step workflows? Skyvern supports stringing multiple tasks into workflows, executing each task sequentially through API endpoints to complete complex operational processes.

Skyvern: Automating Browser-Based Workflows with LLM and Computer Vision

General Introduction

Skyvern Feature List

Using Help

Installation process

visualization task

workflow

Task API Usage Flow (Example)

common problems

Related articles

Recommended

Can't find AI tools? Try here!

FLUX.1 image generator (supports Chinese input)

Recent AI Hotspots

AI Tools Recommendations

AI Tools Classification