AI Personal Learning
and practical guidance
讯飞绘镜

Playwright MCP: Browser Automation MCP Service from Microsoft

General Introduction

Playwright MCP is an open source tool developed by Microsoft and hosted on GitHub. It allows AI models to directly control browsers through the Model Context Protocol (MCP) protocol, completing operations such as opening web pages, clicking on elements, and entering text. The tool is based on the Playwright framework and supports browsers such as Chromium, Firefox, and WebKit. Its core features are that it is fast, lightweight, and generates structured data without relying on screenshots or visual models.Playwright MCP is particularly well suited for AI applications that require web page interaction, such as automated testing or data extraction. The official documentation is updated until March 2025, and the project is active and popular with developers.

Project of the same name:MCP Playwright: an MCP service that provides browser automation operations


Playwright MCP:微软推出的浏览器自动化MCP服务-1

 

Function List

  • Browser control support: ability to open web pages, navigate pages, click on elements, etc.
  • Generate structured data: Output data via accessibility snapshots without screenshots.
  • Two modes are provided: the default Snapshot Mode and Vision Mode.
  • Screenshot and Save: You can take a screenshot of a page or save it as a PDF file.
  • Input and operation: Support input text, keystroke, drag and drop, etc.
  • Compatible with headless mode: you can run the browser in the background without displaying the interface.

 

Using Help

The Playwright MCP is simple to install and use. The following is a detailed description of how to install and operate this tool, including the specific features of the two modes.

Installation process

  1. Preparing the environment
    Install Node.js first (the latest LTS version is recommended, e.g. v22). Check the version with:
node -v

If you don't have it, visit the official Node.js website to download and install it.

  1. Installing the Playwright MCP
    Run the following command in the terminal:
npm install -g @playwright/mcp

Or just use the latest version:

npx @playwright/mcp@latest
  1. Start the server
    Enter the command to start:
npx @playwright/mcp@latest

The default is header mode (showing the browser window). Want to use headerless mode, add parameter:

npx @playwright/mcp@latest --headless
  1. Configuring the AI Client
    If your AI tool supports MCP (e.g. some large model clients), you need to edit the configuration file. For example:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest", "--headless"]
}
}
}

Once saved, the AI will be able to call the browser through the MCP.

  1. Configuration for non-monitor environments
    In a Linux monitorless environment, you can use client-server mode. Start by running it on a machine with a monitor:
npx playwright run-server

The output will show a WebSocket address, such as ws://localhost:port/. and then added in the MCP configuration:

{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"],
"env": {
"PLAYWRIGHT_WS_ENDPOINT": "ws://localhost:port/"
}
}
}
}

How to use the main features

Playwright MCP has two modes: snapshot mode and visual mode. They are described separately below.

Snapshot mode (default)

This mode operates with accessible snapshots, which are fast and stable. Commonly used tools are listed below:

  1. Open the page.
  • Instructions:browser_navigate "https://example.com"
  • Action: The browser opens the specified URL.
  • Output: Returns the page load status.
  1. clicking on an element
  • Instructions:browser_click "登录按钮" "ref123"
  • Action: Click on the snapshot labeled ref123 of the element (element description and citation required).
  • Note: References are from snapshot data.
  1. input
  • Instructions:browser_type "用户名输入框" "ref456" "myuser" true
  • Operation: In the ref456 in the input box, and then press Enter (true (indicates submission).
  1. Save as PDF
  • Instructions:browser_save_as_pdf
  • Action: Saves the current page as a PDF file.
  1. waiting time
  • Instructions:browser_wait 5
  • Operation: Wait 5 seconds (maximum 10 seconds).

visual pattern

priming with --vision Parameters:

npx @playwright/mcp@latest --vision

This mode operates with screenshots and coordinates and is suitable for visual models. Commonly used tools:

  1. Intercept page
    • Instructions:browser_screenshot
    • Action: Generates a screenshot of the current page.
  2. Click on the coordinates
    • Instructions:browser_click 100 200
    • Action: Click at coordinates (100, 200).
  3. drag-and-drop operation
    • Instructions:browser_drag 50 50 150 150
    • Action: Drag from (50, 50) to (150, 150).
  4. input
    • Instructions:browser_type "hello" true
    • Action: Type "hello" and press Enter.

Example of operation flow

Suppose you want to log in to the website:

  1. Start the server:
npx @playwright/mcp@latest --headless
  1. Open the login page:
  • Instructions:browser_navigate "https://example.com/login"
  1. Enter the user name and password (snapshot mode):
  • Instructions:browser_type "用户名" "ref1" "myuser" false
  • Instructions:browser_type "密码" "ref2" "mypassword" true
  1. Click Login (Visual Mode):
  • Switching modes: restarting the server plus --vision
  • Instructions:browser_click 300 400
  1. Inspection results:
  • Instructions:browser_snapshot(snapshot mode) or browser_screenshot(visual mode).

caveat

  • Snapshot mode is more reliable than visual mode, but requires element references.
  • The visual model is suitable for AI models with coordinates.
  • Headless mode is good for batch tasks, and header mode is easy for debugging.

 

application scenario

  1. Web navigation and form filling
    The AI automatically opens web pages, fills out forms and submits them, making it suitable for batch registration or login testing.
  2. data extraction
    Grab structured data from dynamic web pages, such as prices or reviews.
  3. automated test
    Check that the page functions properly, such as button clicks or page jumps.
  4. Intelligent Agent Interaction
    Let AI operate the browser to accomplish complex tasks, such as online shopping.

 

QA

  1. What is the difference between snapshot mode and visual mode?
    Snapshot mode operates with structured data and is fast and stable; visual mode uses screenshots and coordinates and is suitable for visual AI.
  2. What browsers are supported?
    Chromium, Firefox and WebKit are supported.
  3. Need to write code?
    No need. Just send a simple command and the AI will operate.
May not be reproduced without permission:Chief AI Sharing Circle " Playwright MCP: Browser Automation MCP Service from Microsoft
en_USEnglish