AI Personal Learning
and practical guidance

Skyvern: Automating Browser-Based Workflows with LLM and Computer Vision

General Introduction

Skyvern is a tool for automating browser workflows using Large Language Modeling (LLM) and computer vision techniques. It efficiently automates manual actions on a large number of websites by providing a simple API endpoint that can replace automation solutions that are fragile or unreliable.Skyvern operates on websites that have never been seen before, automatically mapping visual elements to the actions needed to complete the workflow without any custom code.

Skyvern is a bit like BabyAGI and AutoGPT with visualization capabilities, where multiple agents fully automate the process of thinking about and acting on task objectives.


Skyvern: Automating Browser-Based Workflows with LLM and Computer Vision-1

Skyvern Online Experience: https://www.skyvern.com/

 

Skyvern Feature List

  • Automating Browser Workflows: Automate various tasks in the browser through LLM and computer vision techniques.
  • API endpoint: Provides a simple API interface for easy integration and calling.
  • No custom code required: No need to write custom scripts for each site, adaptable.
  • Resistant to web layout changes: Does not rely on a fixed XPath or selector, and is able to cope with changes in page layout.
  • large scale application: The ability to apply a single workflow to multiple sites.
  • intelligent interaction: Reasoning using LLM to handle complex interaction scenarios.

 

Using Help

Installation process

  1. environmental preparation::
    • Make sure Python 3.11 and above is installed.
    • Install the Poetry dependency management tool.
    • Install the PostgreSQL database.
    • Install the Node.js environment.
    • Support Docker one-click deployment
  2. clone source code::
    git clone https://github.com/skyvern-ai/skyvern.git
    cd skyvern
    
  3. Installation of dependencies::
    . /setup.sh
    
  4. Configuring Environment Variables::
    • modifications .env file, fill in the required API key and configuration parameters.
  5. Starting services::
    docker-compose up -d
    

visualization task

How to understand the information displayed by Skyvern

Visualization of results

Skyvern comes with a visualization tool to help you understand how your tasks are performing. First, navigate to the Task History page and click on any task to view it.

Skyvern: Automating Browser-Based Workflows with LLM and Computer Vision-1

 

manipulate

Each action performed by Skyvern can be viewed in the Action Viewer and is accompanied by a screenshot of the screen state after the action is performed.

Skyvern: Automating Browser-Based Workflows with LLM and Computer Vision-2

 

record (video or audio)

Each Skyvern task contains a recording of the entire operation (end-to-end). To view the recording, click the Recordings tab.

Skyvern: Automating Browser-Based Workflows with LLM and Computer Vision-3

 

Mission parameters

Task parameters are the inputs you provide to Skyvern, which include URLs, extraction rules, and any other relevant information.

Skyvern: Automating Browser-Based Workflows with LLM and Computer Vision-4

 

Diagnostic Log

The Diagnostics tab contains information that Skyvern uses for processing, including annotated screenshots, action screenshots, element trees, hints, action lists, page HTML, and raw Large Language Model (LLM) requests.

Skyvern: Automating Browser-Based Workflows with LLM and Computer Vision-5

 

workflow

Workflow - linking multiple tasks together

A workflow represents linking multiple chunks of data together. Imagine invoking multiple tasks in succession, performing conditional logic, extracting data to CSV, etc. All these ideas will be supported in our workflow functionality.

All of our workflows begin with YAML format definitions, but the new version provides a graphical interface that allows multiple components to be linked together to produce some defined output.

Skyvern: Automating Browser-Based Workflows with LLM and Computer Vision-1

 

Supported Modules

  1. TaskBlock: The magic Skyvern navigates through the websites to take actions and/or extract information.
  2. ForLoopBlock
  3. CodeBlock
  4. TextPromptBlock
  5. DownloadToS3Block
  6. UploadToS3Block
  7. SendEmailBlock
  8. FileParserBlock

 

Task block inputs

  1. URL (usually required). The starting point for the Skyvern Agent is, ideally, the target website you wish to automate.
    • In the workflow screen, if this input is left blank, it will continue where the previous node stopped. The purpose of the navigation target is to set or reset the starting point of the agent.
    • If you logged in to a site in the first task block, you may want to leave the URL blank in the second block to continue after you logged in
  2. Navigational Objectives (usually required). A detailed description of where Skyvern is navigated and what actions are performed. A clear navigation goal should be a single objective broken down into steps. Avoid providing multiple goals. You need to use "COMPLETE" to specify goal completion, or "TERMINATE" to abort the goal.
    • The navigation target is not used for loading URLs; requiring Skyvern to "Visit Site A" in this field will not have the desired effect!
    • Terminating the operation will cause Skyvern to explain why it stopped navigation
    • You can omit this field if you only want Skyvern to extract data without navigating to another location
  3. Data Extraction Objectives (Optional). Is there any data that Skyvern extracts and returns other than Skyvern's navigational position and actions? Good data extraction objectives should specify what data Skyvern returns to the user
    • Please note that data extraction only takes place after Skyvern has completed navigation!
  4. Extraction of information model (Optional). If you have data extraction goals, some users may need the data in a specific format for internal purposes. The navigation load accepts the JSON format specification for formatting data returns
  5. Maximum number of steps covered (Optional). Some users want to limit costs by the number of steps in a task
  6. Maximum number of retries (Optional). Number of retries allowed if a step fails
  7. Download complete and you're done (Optional). Allow Skyvern to complete tasks after the file has been downloaded
  8. File Suffix (Optional). Identifiers attached to downloaded files
  9. TOTP URLs and TOTP identifiers (Optional). If you have an internal system that can store TOTP code for 2FA, this URL calls that storage. The identifier associates the code with the task, which is important if you are running multiple tasks at the same time. If you want to set up 2FA acquisition in a workflow, the Please contact usThe
  10. parameters (Optional). Parameters are custom placeholders that specify the run. They can be workflow parameters, passed in via an API call, or output parameters taken from a previous task block. If specified, these parameters will be used by Skyvern to aid navigation, fill out forms, or further influence actions on the site.

 

 

 

Task API Usage Flow (Example)

The full documentation of the Tasks API is available at

  1. Creating Tasks::
    • Create a task through an API endpoint, specifying the target URL and the operation target.
    • Example Request:
      {
      "url": "https://example.com",.
      "navigation_goal": "Fill out the form and submit it",
      "data_extraction_goal": "Extract the confirmation message after submission"
      }
  2. Monitoring Tasks::
    • Use the real-time monitoring feature provided by Skyvern to see how tasks are performing.
    • Access via browser http://localhost:8080 View real-time operations.
  3. data extraction::
    • Specify the data extraction mode and format, and Skyvern will automatically extract and return the data.
    • Example Request:
      {
      "url": "https://example.com/data",
      "data_extraction_schema": {
      "name": "string",
      "email": "string",
      "phone": "string"
      }
      }
  4. File Download::
    • Specify the file download target and Skyvern will automatically download the file and provide the download link.
    • Example Request:
      {
      "url": "https://example.com/files",
      "file_download_goal": "Download all PDF files"
      }

 

common problems

  • How is authentication handled? Skyvern supports multiple authentication methods, including password manager integration and multi-factor authentication (2FA). When creating a task, the navigation_payload Provide identification information.
  • How do you handle complex multi-step workflows? Skyvern supports stringing multiple tasks into workflows, executing each task sequentially through API endpoints to complete complex operational processes.
May not be reproduced without permission:Chief AI Sharing Circle " Skyvern: Automating Browser-Based Workflows with LLM and Computer Vision

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish