General Introduction
Midscene.js is an AI-powered browser automation tool capable of controlling web pages, executing assertions and extracting data through natural language commands. It supports Chrome extensions, JavaScript SDKs, and YAML scripts, simplifying the process of writing and maintaining UI tests. By leveraging multimodal big language models such as GPT-4o, Midscene.js provides a new automated development experience that allows users to intuitively interact with web pages and fetch structured JSON data.
Byte open source Midscene.js, natural language + interface screenshots directly generate E2E tests, saving the team countless hours of repetitive labor, and the current Coding + multimodal capabilities to solve many basic E2E problems has been very perfect.
Function List
- natural language interaction: Using natural language to describe the steps, AI automatically plans and controls the user interface.
- JSON Data Extraction: Automatically generate response data in JSON format according to user requirements.
- intuitive assertion: Assertions are made in natural language, which the AI understands and executes.
- Chrome Extension Experience: No need to write code to start the experience with extensions.
- Visualization reports: Provide detailed implementation reports to help users understand and debug the process.
- Support for multiple scripts: Includes JavaScript and YAML, providing flexible automated scripting.
Using Help
Installation and Configuration
Install the Chrome extension:
- Visit the Chrome Online App Store and search for "Midscene".
- Click the "Add to Chrome" button.
- Confirm the installation and allow permissions.
Configure environment variables (for SDK use):
- For OpenAI API usage, you will need to create an
.env
file, add the following:
export OPENAI_API_KEY="Your API key"
export MIDSCENE_MODEL_NAME="gpt-4o"
- If you are using another model service, you need to adjust the above environment variables accordingly.
Usage Process
Used via Chrome extension
- Launch extensions: After installation, the extension icon will be displayed in the browser toolbar. Click on the icon to open the Midscene control panel.
- interactive operation: Enter natural language commands in the control panel, such as "Click on the login button" or "Extract all headings from a web page".
- View Results: After the operation is complete, the extension returns the results of the execution, usually presenting the extracted data in JSON format.
Used via JavaScript SDK
- Introducing the SDK::
import { ai, aiQuery, aiAssert } from'@midscene/web';
- executable operation::
- basic operation: Useaifunction performs simple web page operations. Example:
await ai('Type in the search box "React"');
- data extraction: UseaiQueryto extract the data:
const data = await aiQuery('{title: string, price: number}[]', 'Find the list of products and extract the title and price');
- assertion checking: UtilizationaiAssertMake assertions:
await aiAssert('There should be a log in button on the page');
- basic operation: Useaifunction performs simple web page operations. Example:
Using YAML Scripts
- Writing YAML scripts: Define your automation tasks in a **.yaml** file, for example:
-action:type selector:'input[name="search"]' value:'JavaScript' -action:click selector:'button[type="submit"]'
- executable script: Run these scripts via command line tools or Midscene's CLI.
Operational details
- natural language instruction: Instructions can be as simple as "click", "enter" or as complex as "find all products labeled 'Sale' and record the price! ".
- error handling: If the operation fails, Midscene provides a detailed report indicating the reason for the failure and helps you adjust the command.
- Debug and Playback: The execution of each test or operation can be played back with visual reports to help you understand or debug your scripts.
This detailed usage guide ensures that users get up to speed quickly and take full advantage of Midscene.js features for efficient browser automation testing.