Playwright MCP: Browser Automation MCP Service from Microsoft

Latest AI Resources5mos agorelease AI Sharing Circle

1.6K 00

General Introduction

Playwright MCP is an open source tool developed by Microsoft and hosted on GitHub. It allows AI models to directly control browsers through the Model Context Protocol (MCP) protocol, completing operations such as opening web pages, clicking on elements, and entering text. The tool is based on the Playwright framework and supports browsers such as Chromium, Firefox, and WebKit. Its core features are that it is fast, lightweight, and generates structured data without relying on screenshots or visual models.Playwright MCP is particularly well suited for AI applications that require web page interaction, such as automated testing or data extraction. The official documentation is updated until March 2025, and the project is active and popular with developers.

Project of the same name:MCP Playwright: an MCP service that provides browser automation operations

Function List

Browser control support: ability to open web pages, navigate pages, click on elements, etc.
Generate structured data: Output data via accessibility snapshots without screenshots.
Two modes are provided: the default Snapshot Mode and Vision Mode.
Screenshot and Save: You can take a screenshot of a page or save it as a PDF file.
Input and operation: Support input text, keystroke, drag and drop, etc.
Compatible with headless mode: you can run the browser in the background without displaying the interface.

Using Help

The Playwright MCP is simple to install and use. The following is a detailed description of how to install and operate this tool, including the specific features of the two modes.

Installation process

Preparing the environment
Install Node.js first (the latest LTS version is recommended, e.g. v22). Check the version with:

node -v

If you don't have it, visit the official Node.js website to download and install it.

Installing the Playwright MCP
Run the following command in the terminal:

npm install -g @playwright/mcp

Or just use the latest version:

npx @playwright/mcp@latest

Start the server
Enter the command to start:

npx @playwright/mcp@latest

The default is header mode (showing the browser window). Want to use headerless mode, add parameter:

npx @playwright/mcp@latest --headless

Configuring the AI Client
If your AI tool supports MCP (e.g. some large model clients), you need to edit the configuration file. For example:

{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest", "--headless"]
}
}
}

Once saved, the AI will be able to call the browser through the MCP.

Configuration for non-monitor environments
In a Linux monitorless environment, you can use client-server mode. Start by running it on a machine with a monitor:

npx playwright run-server

The output will show a WebSocket address, such as ws://localhost:port/. and then added in the MCP configuration:

{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"],
"env": {
"PLAYWRIGHT_WS_ENDPOINT": "ws://localhost:port/"
}
}
}
}

How to use the main features

Playwright MCP has two modes: snapshot mode and visual mode. They are described separately below.

Snapshot mode (default)

This mode operates with accessible snapshots, which are fast and stable. Commonly used tools are listed below:

Open the page.

Instructions:browser_navigate "https://example.com"
Action: The browser opens the specified URL.
Output: Returns the page load status.

clicking on an element

Instructions:browser_click "登录按钮" "ref123"
Action: Click on the snapshot labeled ref123 of the element (element description and citation required).
Note: References are from snapshot data.

input

Instructions:browser_type "用户名输入框" "ref456" "myuser" true
Operation: In the ref456 in the input box, and then press Enter (true (indicates submission).

Save as PDF

Instructions:browser_save_as_pdf
Action: Saves the current page as a PDF file.

waiting time

Instructions:browser_wait 5
Operation: Wait 5 seconds (maximum 10 seconds).

visual pattern

priming with --vision Parameters:

npx @playwright/mcp@latest --vision

This mode operates with screenshots and coordinates and is suitable for visual models. Commonly used tools:

Intercept page
- Instructions:browser_screenshot
- Action: Generates a screenshot of the current page.
Click on the coordinates
- Instructions:browser_click 100 200
- Action: Click at coordinates (100, 200).
drag-and-drop operation
- Instructions:browser_drag 50 50 150 150
- Action: Drag from (50, 50) to (150, 150).
input
- Instructions:browser_type "hello" true
- Action: Type "hello" and press Enter.

Example of operation flow

Suppose you want to log in to the website:

Start the server:

npx @playwright/mcp@latest --headless

Open the login page:

Instructions:browser_navigate "https://example.com/login"

Enter the user name and password (snapshot mode):

Instructions:browser_type "用户名" "ref1" "myuser" false
Instructions:browser_type "密码" "ref2" "mypassword" true

Click Login (Visual Mode):

Switching modes: restarting the server plus --vision
Instructions:browser_click 300 400

Inspection results:

Instructions:browser_snapshot(snapshot mode) or browser_screenshot(visual mode).

caveat

Snapshot mode is more reliable than visual mode, but requires element references.
The visual model is suitable for AI models with coordinates.
Headless mode is good for batch tasks, and header mode is easy for debugging.

application scenario

Web navigation and form filling
The AI automatically opens web pages, fills out forms and submits them, making it suitable for batch registration or login testing.
data extraction
Grab structured data from dynamic web pages, such as prices or reviews.
automated test
Check that the page functions properly, such as button clicks or page jumps.
Intelligent Agent Interaction
Let AI operate the browser to accomplish complex tasks, such as online shopping.

QA

What is the difference between snapshot mode and visual mode?
Snapshot mode operates with structured data and is fast and stable; visual mode uses screenshots and coordinates and is suitable for visual AI.
What browsers are supported?
Chromium, Firefox and WebKit are supported.
Need to write code?
No need. Just send a simple command and the AI will operate.