Browser-Use: Building Intelligent Web Automation Tools for AI Intelligents to Easily Operate Browsers

Latest AI Resources8mos agoupdate AI Sharing Circle

2.8K 00

General Introduction

Browser-Use is an innovative open source web automation tool specifically designed to enable Language Models (LLMs) to naturally interact with websites. It provides a powerful and flexible framework that supports a wide range of mainstream language models, including GPT-4, Claude, and others. The most notable feature of the tool is the seamless integration of AI capabilities with browser automation, supporting visual recognition and HTML extraction, automatic management of multi-tab pages, intelligent element detection, etc. Browser-Use not only performs simple web browsing tasks, but also handles complex interaction scenarios such as auto-filling of forms, submission of applications, and searching for information. It is designed to enable AI agents to use browsers as naturally as humans, greatly simplifying the development process of web automation. This tool is especially suitable for developers who need to perform web automation, data collection, and batch operations.

Function List

Supports visual recognition and intelligent extraction of HTML content
Automated multi-tabbed page management system
Extracts XPath paths of clicked elements and reproduces exact LLM operations.
Support for customized actions (e.g., save file, push database, send notification, get manual input)
Ability to self-correct
Compatible with all language models supported by LangChain
Support for running multiple AI agents in parallel
Configurable browser security features
Cookie persistent storage function
Flexible page load wait time settings

Using Help

1. Installation configuration

First install the Browser-Use package via pip:

pip install browser-use

(Optional) Install playwright:

playwright install

Configure environment variables:
establish.envfile and add the necessary API keys:

OPENAI_API_KEY=你的OpenAI API密钥
ANTHROPIC_API_KEY=你的Anthropic API密钥

2. Basic methods of use

2.1 Creating a simple AI agent

from langchain_openai import ChatOpenAI
from browser_use import Agent
import asyncio
async def main():
agent = Agent(
task="查找特定航班信息",
llm=ChatOpenAI(model="gpt-4"),
)
result = await agent.run()
print(result)
asyncio.run(main())

2.2 Registering a customized action

Custom actions can be added by means of decorators:

from browser_use.controller.service import Controller
controller = Controller()
@controller.action('询问用户信息')
def ask_human(question: str, display_question: bool) -> str:
return input(f'\n{question}\nInput: ')

2.3 Defining Parametric Models with Pydantic

from pydantic import BaseModel
from typing import Optional
class JobDetails(BaseModel):
title: str
company: str
job_link: str
salary: Optional[str] = None
@controller.action('保存职位详情', param_model=JobDetails, requires_browser=True)
async def save_job(params: JobDetails, browser: Browser):
print(params)
page = browser.get_current_page()
page.go_to(params.job_link)

3. Use of advanced functions

3.1 Parallelization agents

It is recommended to use a single Browser instance and parallelize the context for each agent:

browser = Browser()
for i in range(10):
async with browser.new_context() as context:
agent = Agent(
task=f"任务 {i}",
llm=model,
browser_context=context
)
# 处理任务...

3.2 Browser Configuration

Browser behavior can be configured through the BrowserConfig and BrowserContextConfig classes:

browser_config = BrowserConfig(
headless=False,  # 是否使用无头模式
keep_open=True,  # 脚本结束后保持浏览器开启
disable_security=True,  # 禁用安全特性
cookies_file="cookies.json",  # Cookie存储文件
minimum_wait_page_load_time=1.0,  # 最小页面加载等待时间
wait_for_network_idle_page_load_time=2.0,  # 网络空闲等待时间
maximum_wait_page_load_time=10.0  # 最大页面加载等待时间
)

4. Performance optimization recommendations

Use appropriate page load wait times to avoid excessive waits
Rational use of parallelization features to improve processing efficiency
Enable headless mode when appropriate to reduce resource usage
Reduce Duplicate Authentication with Cookie Persistence
Adjust the configuration of safety features as required

5. Troubleshooting

If you encounter problems with cross-domain requests, consider enabling thedisable_securityoptions (as in computer software settings)
Page load timeout with adjustable wait time parameter
Ensure that the API key is configured correctly
Check network connection status
Check the browser console log for detailed error messages