General Introduction
Playwright MCP is an open source tool developed by Microsoft and hosted on GitHub. It allows AI models to directly control browsers through the Model Context Protocol (MCP) protocol, completing operations such as opening web pages, clicking on elements, and entering text. The tool is based on the Playwright framework and supports browsers such as Chromium, Firefox, and WebKit. Its core features are that it is fast, lightweight, and generates structured data without relying on screenshots or visual models.Playwright MCP is particularly well suited for AI applications that require web page interaction, such as automated testing or data extraction. The official documentation is updated until March 2025, and the project is active and popular with developers.
Project of the same name:MCP Playwright: an MCP service that provides browser automation operations
Function List
- Browser control support: ability to open web pages, navigate pages, click on elements, etc.
- Generate structured data: Output data via accessibility snapshots without screenshots.
- Two modes are provided: the default Snapshot Mode and Vision Mode.
- Screenshot and Save: You can take a screenshot of a page or save it as a PDF file.
- Input and operation: Support input text, keystroke, drag and drop, etc.
- Compatible with headless mode: you can run the browser in the background without displaying the interface.
Using Help
The Playwright MCP is simple to install and use. The following is a detailed description of how to install and operate this tool, including the specific features of the two modes.
Installation process
- Preparing the environment
Install Node.js first (the latest LTS version is recommended, e.g. v22). Check the version with:
node -v
If you don't have it, visit the official Node.js website to download and install it.
- Installing the Playwright MCP
Run the following command in the terminal:
npm install -g @playwright/mcp
Or just use the latest version:
npx @playwright/mcp@latest
- Start the server
Enter the command to start:
npx @playwright/mcp@latest
The default is header mode (showing the browser window). Want to use headerless mode, add parameter:
npx @playwright/mcp@latest --headless
- Configuring the AI Client
If your AI tool supports MCP (e.g. some large model clients), you need to edit the configuration file. For example:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest", "--headless"]
}
}
}
Once saved, the AI will be able to call the browser through the MCP.
- Configuration for non-monitor environments
In a Linux monitorless environment, you can use client-server mode. Start by running it on a machine with a monitor:
npx playwright run-server
The output will show a WebSocket address, such as ws://localhost:port/
. and then added in the MCP configuration:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"],
"env": {
"PLAYWRIGHT_WS_ENDPOINT": "ws://localhost:port/"
}
}
}
}
How to use the main features
Playwright MCP has two modes: snapshot mode and visual mode. They are described separately below.
Snapshot mode (default)
This mode operates with accessible snapshots, which are fast and stable. Commonly used tools are listed below:
- Open the page.
- Instructions:
browser_navigate "https://example.com"
- Action: The browser opens the specified URL.
- Output: Returns the page load status.
- clicking on an element
- Instructions:
browser_click "登录按钮" "ref123"
- Action: Click on the snapshot labeled
ref123
of the element (element description and citation required). - Note: References are from snapshot data.
- input
- Instructions:
browser_type "用户名输入框" "ref456" "myuser" true
- Operation: In the
ref456
in the input box, and then press Enter (true
(indicates submission).
- Save as PDF
- Instructions:
browser_save_as_pdf
- Action: Saves the current page as a PDF file.
- waiting time
- Instructions:
browser_wait 5
- Operation: Wait 5 seconds (maximum 10 seconds).
visual pattern
priming with --vision
Parameters:
npx @playwright/mcp@latest --vision
This mode operates with screenshots and coordinates and is suitable for visual models. Commonly used tools:
- Intercept page
- Instructions:
browser_screenshot
- Action: Generates a screenshot of the current page.
- Instructions:
- Click on the coordinates
- Instructions:
browser_click 100 200
- Action: Click at coordinates (100, 200).
- Instructions:
- drag-and-drop operation
- Instructions:
browser_drag 50 50 150 150
- Action: Drag from (50, 50) to (150, 150).
- Instructions:
- input
- Instructions:
browser_type "hello" true
- Action: Type "hello" and press Enter.
- Instructions:
Example of operation flow
Suppose you want to log in to the website:
- Start the server:
npx @playwright/mcp@latest --headless
- Open the login page:
- Instructions:
browser_navigate "https://example.com/login"
- Enter the user name and password (snapshot mode):
- Instructions:
browser_type "用户名" "ref1" "myuser" false
- Instructions:
browser_type "密码" "ref2" "mypassword" true
- Click Login (Visual Mode):
- Switching modes: restarting the server plus
--vision
- Instructions:
browser_click 300 400
- Inspection results:
- Instructions:
browser_snapshot
(snapshot mode) orbrowser_screenshot
(visual mode).
caveat
- Snapshot mode is more reliable than visual mode, but requires element references.
- The visual model is suitable for AI models with coordinates.
- Headless mode is good for batch tasks, and header mode is easy for debugging.
application scenario
- Web navigation and form filling
The AI automatically opens web pages, fills out forms and submits them, making it suitable for batch registration or login testing. - data extraction
Grab structured data from dynamic web pages, such as prices or reviews. - automated test
Check that the page functions properly, such as button clicks or page jumps. - Intelligent Agent Interaction
Let AI operate the browser to accomplish complex tasks, such as online shopping.
QA
- What is the difference between snapshot mode and visual mode?
Snapshot mode operates with structured data and is fast and stable; visual mode uses screenshots and coordinates and is suitable for visual AI. - What browsers are supported?
Chromium, Firefox and WebKit are supported. - Need to write code?
No need. Just send a simple command and the AI will operate.