Nanobrowser: Multi-Intelligence Plugin for Task Automation in Browsers

Latest AI Resources5mos agoupdate AI Sharing Circle

2.9K 00

General Introduction

Nanobrowser is an open source Chrome extension designed to automate web tasks through an AI-powered multi-agent system. It is a free alternative to OpenAI Operator, which users can use by simply providing their LLM (Large Language Model) API key, with support for OpenAI and Anthropic models, with more options to be extended in the future. All operations are run in a local browser, with no cloud data sharing involved, ensuring privacy and security.Nanobrowser handles tasks ranging from simple searches to complex processes through the collaboration of three agents: Planner, Navigator, and Validator. The project code is hosted on GitHub with an active community where users can participate in discussions and contribute via Discord or X.

Function List

multi-agent system: Planner develops strategies, Navigator performs operations, and Validator verifies results, collaborating to accomplish complex tasks.
Flexible LLM support: Support for OpenAI and Anthropic allows users to choose different models for different agents.
local operation:: Data processing is done locally to protect user privacy.
Task automation:: Perform web searches, form filling, data extraction, and other operations.
Interactive Sidebar:: Provide a chat interface with real-time status updates.
Dialog History:: Keeping records of tasks to support subsequent viewing and management.
open source and transparent: The code is open for review and improvement.
Follow-up questions:: Support for contextualized questioning based on task results.

Using Help

Installation process

Nanobrowser is available as a Chrome extension that offers two installation options: downloading a pre-built version directly or building from source.

Method 1: Direct installation of pre-built version

Download Extensions
- interviews https://github.com/nanobrowser/nanobrowser/releasesThe
- Find the latest version (e.g. v1.0.0) on the Releases page.
- Download the file called "nanobrowser.zip".
Unzip the file
- Extract "nanobrowser.zip" to a local folder (e.g. "nanobrowser" folder).
Load to Chrome
- Open Chrome and typechrome://extensions/The
- Enable "Developer Mode" in the upper right corner.
- Click on "Load unpacked" in the upper left corner.
- Select the unzipped "nanobrowser" folder and click "Select Folder".
- After successful installation, the Nanobrowser icon appears in the Chrome toolbar.
Configuring the API Key
- Click the Nanobrowser icon in the toolbar to open the sidebar.
- Click on the Settings icon in the upper right corner.
- Enter your LLM API key (available on the OpenAI or Anthropic websites).
- Select models for Planner, Navigator, Validator (e.g., OpenAI's GPT-4o or Anthropic's Claude).
- Save the settings to complete the configuration.

Method 2: Build from Source

Preparing the environment
- mounting Node.js(v22.12.0 or later).
- mounting pnpm(v9.15.1 or later).

clone warehouse

Open a terminal and enter the following command:

git clone https://github.com/nanobrowser/nanobrowser.git
cd nanobrowser

Installation of dependencies
- Input:
```
pnpm install
```
Building extensions
- Input:
```
pnpm build
```
- When the build is complete, the "dist" folder will contain the extension files.
Load to Chrome
- Follow step 3 in "Method 1" to load the "dist" folder.
Development mode (optional)
- If real-time debugging is required, run:
```
pnpm dev
```

How to use the main features

1. Mandate automation

workflow:
- Click the Nanobrowser icon in the toolbar to open the sidebar.
- Enter a task command in the input box, such as "Go to TechCrunch and extract the top 10 headlines from the last 24 hours."
- Click "Execute" to start the multi-agent system:
  - Planner: Create a task plan, such as opening TechCrunch and locating the headline area.
  - Navigator:: Perform web navigation and data extraction.
  - Validator:: Compliance of inspection results with requirements.
- Results are displayed in a sidebar that supports copying or follow-up questions.
Usage Scenarios:
- News Summary: Extracts the latest information from a specific website.
- Shopping Research:: Search Amazon for "waterproof bluetooth speaker, under $50, with over 10 hours of battery life".
- Code Research: Find the most popular Python repositories on GitHub.

2. Configuration agent model

workflow:
- Open the sidebar and click on "Settings".
- Enter the API key and select the model, for example:
  - Planner: OpenAI GPT-4o
  - Navigator. Anthropic Claude 3.5 Sonnet
  - Validator: OpenAI GPT-3.5
- Click "Save" to test if the connection is successful.
draw attention to sth.:
- Different models are suitable for different tasks and it is recommended to try combinations to improve efficiency.
- Ensure that the API key is valid to avoid task interruption.

3. Viewing and managing dialog history

workflow:
- Select Conversation History in the sidebar.
- Displays a list of tasks with times, instructions, and results.
- Click on a record to view the details, or select "Retry" to run it again.
practical skill:
- Export history as a JSON file for easy backup.
- Examine logs of failed tasks and optimize instructions or models.

4. Follow-up questions

workflow:
- Once the task is complete, enter a follow-up question in the sidebar, such as "Which of these headlines are AI-related?" .
- The system answers based on previous results without having to re-execute the complete task.
dominance:
- Improved interaction efficiency and suitability for in-depth analysis.

Featured Function Operation

multi-agent system

How to experience:
- Enter complex commands such as "Find the 5 most popular AI models on HuggingFace and organize them into a list".
- Planner breaks down the task, Navigator extracts the data, and Validator verifies the accuracy.
- The results are returned in structured form.
dominance:
- Dynamic Error Correction: Planner adjusts its strategy as it encounters obstacles.
- Efficient Collaboration: Save time by processing three agents in parallel.

Local operation and privacy protection

How to verify:
- Open Chrome Developer Tools (F12) and switch to the "Network" tab.
- When executing a task, only LLM API calls are seen, with no other external requests.
mileage:
- User credentials and sensitive data are not uploaded to the cloud, making it safe and secure.

Interactive Sidebar

How to use:
- When the sidebar is opened, the progress of the task is displayed in real time (e.g. "Navigating", "Validating").
- Support for adjusting commands or stopping tasks midway.
specificities:
- The interface is intuitive and suitable for both novice and professional users.

caveat

network requirement: A stable network is required to call the LLM API.
Hardware Recommendations:: Runs better on high-performance equipment.
Community Support:: Join if you have problems Discord or focus on X Get help.

Latest AI Resources # AI Java Open Source Projecct # Desktop Automation Intelligence

文章版权归 AI Sharing Circle 所有，未经允许请勿转载。

DiffSynth-Engine: Open Source Engine for Low-Existing Deployments of FLUX, Wan 2.1

Latest AI Resources # AI Java Open Source Projecct

5mos ago

01.6K

Cori: AI free kids coloring drawing generation, AI coloring page generator to enhance children's creativity

Latest AI Resources # AI Image Style Control

6mos ago

01.8K

Midreal AI: Interactive AI Text Adventure Games and Fantasy Fiction Writing

Latest AI Resources # AI Writing # AI Role Play

1yrs ago

02.8K

Magic Cube Resume - AI resume optimization tool, professional advice to improve the quality of your resume

Latest AI Resources

2mos ago

01.2K

No comments

You must be logged in to leave a comment!

No comments...

Nanobrowser: Multi-Intelligence Plugin for Task Automation in Browsers

General Introduction

Function List