AI Personal Learning
and practical guidance
豆包Marscode1

Browser-Use: Building Intelligent Web Automation Tools for AI Intelligents to Easily Operate Browsers

General Introduction

Browser-Use is an innovative open source web automation tool specifically designed to enable Language Models (LLMs) to naturally interact with websites. It provides a powerful and flexible framework that supports a wide range of mainstream language models, including GPT-4, Claude, and others. The most notable feature of the tool is the seamless integration of AI capabilities with browser automation, supporting visual recognition and HTML extraction, automatic management of multi-tab pages, intelligent element detection, etc. Browser-Use not only performs simple web browsing tasks, but also handles complex interaction scenarios such as auto-filling of forms, submission of applications, and searching for information. It is designed to enable AI agents to use browsers as naturally as humans, greatly simplifying the development process of web automation. This tool is especially suitable for developers who need to perform web automation, data collection, and batch operations.

Browser-Use:构建智能网页自动化工具,让AI智能体轻松操作浏览器-1


 

Function List

  • Supports visual recognition and intelligent extraction of HTML content
  • Automated multi-tabbed page management system
  • Extracts XPath paths of clicked elements and reproduces exact LLM operations.
  • Support for customized actions (e.g., save file, push database, send notification, get manual input)
  • Ability to self-correct
  • Compatible with all language models supported by LangChain
  • Support for running multiple AI agents in parallel
  • Configurable browser security features
  • Cookie persistent storage function
  • Flexible page load wait time settings

 

Using Help

1. Installation configuration

  1. First install the Browser-Use package via pip:
pip install browser-use
  1. (Optional) Install playwright:
playwright install
  1. Configure environment variables:
    establish.envfile and add the necessary API keys:
OPENAI_API_KEY=你的OpenAI API密钥
ANTHROPIC_API_KEY=你的Anthropic API密钥

2. Basic methods of use

2.1 Creating a simple AI agent

from langchain_openai import ChatOpenAI
from browser_use import Agent
import asyncio
async def main():
agent = Agent(
task="查找特定航班信息",
llm=ChatOpenAI(model="gpt-4"),
)
result = await agent.run()
print(result)
asyncio.run(main())

2.2 Registering a customized action

Custom actions can be added by means of decorators:

from browser_use.controller.service import Controller
controller = Controller()
@controller.action('询问用户信息')
def ask_human(question: str, display_question: bool) -> str:
return input(f'\n{question}\nInput: ')

2.3 Defining Parametric Models with Pydantic

from pydantic import BaseModel
from typing import Optional
class JobDetails(BaseModel):
title: str
company: str
job_link: str
salary: Optional[str] = None
@controller.action('保存职位详情', param_model=JobDetails, requires_browser=True)
async def save_job(params: JobDetails, browser: Browser):
print(params)
page = browser.get_current_page()
page.go_to(params.job_link)

3. Use of advanced functions

3.1 Parallelization agents

It is recommended to use a single Browser instance and parallelize the context for each agent:

browser = Browser()
for i in range(10):
async with browser.new_context() as context:
agent = Agent(
task=f"任务 {i}",
llm=model,
browser_context=context
)
# 处理任务...

3.2 Browser Configuration

Browser behavior can be configured through the BrowserConfig and BrowserContextConfig classes:

browser_config = BrowserConfig(
headless=False,  # 是否使用无头模式
keep_open=True,  # 脚本结束后保持浏览器开启
disable_security=True,  # 禁用安全特性
cookies_file="cookies.json",  # Cookie存储文件
minimum_wait_page_load_time=1.0,  # 最小页面加载等待时间
wait_for_network_idle_page_load_time=2.0,  # 网络空闲等待时间
maximum_wait_page_load_time=10.0  # 最大页面加载等待时间
)

4. Performance optimization recommendations

  1. Use appropriate page load wait times to avoid excessive waits
  2. Rational use of parallelization features to improve processing efficiency
  3. Enable headless mode when appropriate to reduce resource usage
  4. Reduce Duplicate Authentication with Cookie Persistence
  5. Adjust the configuration of safety features as required

5. Troubleshooting

  1. If you encounter problems with cross-domain requests, consider enabling thedisable_securityoptions (as in computer software settings)
  2. Page load timeout with adjustable wait time parameter
  3. Ensure that the API key is configured correctly
  4. Check network connection status
  5. Check the browser console log for detailed error messages
May not be reproduced without permission:Chief AI Sharing Circle " Browser-Use: Building Intelligent Web Automation Tools for AI Intelligents to Easily Operate Browsers
en_USEnglish