General Introduction
Browser-Use is an innovative open source web automation tool specifically designed to enable Language Models (LLMs) to naturally interact with websites. It provides a powerful and flexible framework that supports a wide range of mainstream language models, including GPT-4, Claude, and others. The most notable feature of the tool is the seamless integration of AI capabilities with browser automation, supporting visual recognition and HTML extraction, automatic management of multi-tab pages, intelligent element detection, etc. Browser-Use not only performs simple web browsing tasks, but also handles complex interaction scenarios such as auto-filling of forms, submission of applications, and searching for information. It is designed to enable AI agents to use browsers as naturally as humans, greatly simplifying the development process of web automation. This tool is especially suitable for developers who need to perform web automation, data collection, and batch operations.
Function List
- Supports visual recognition and intelligent extraction of HTML content
- Automated multi-tabbed page management system
- Extracts XPath paths of clicked elements and reproduces exact LLM operations.
- Support for customized actions (e.g., save file, push database, send notification, get manual input)
- Ability to self-correct
- Compatible with all language models supported by LangChain
- Support for running multiple AI agents in parallel
- Configurable browser security features
- Cookie persistent storage function
- Flexible page load wait time settings
Using Help
1. Installation configuration
- First install the Browser-Use package via pip:
pip install browser-use
- (Optional) Install playwright:
playwright install
- Configure environment variables:
establish.env
file and add the necessary API keys:
OPENAI_API_KEY=your OpenAI API key
ANTHROPIC_API_KEY=Your Anthropic API key.
2. Basic methods of use
2.1 Creating a simple AI agent
from langchain_openai import ChatOpenAI
from browser_use import Agent
import asyncio
async def main(): agent = Agent()
agent = Agent(
task="Find information about a specific flight", llm=ChatOpenAI(model="ChatOpenAI", "ChatOpenAI")
llm=ChatOpenAI(model="gpt-4"), )
)
result = await agent.run()
print(result)
asyncio.run(main())
2.2 Registering a customized action
Custom actions can be added by means of decorators:
from browser_use.controller.service import Controller
controller = Controller()
@controller.action('Ask user for information')
def ask_human(question: str, display_question: bool) -> str.
return input(f'\n{question}\nInput: ')
2.3 Defining Parametric Models with Pydantic
from pydantic import BaseModel
from typing import Optional
class JobDetails(BaseModel).
title: str
company: str
job_link: str
salary: Optional[str] = None
@controller.action('Save Job Details', param_model=JobDetails, requires_browser=True)
async def save_job(params: JobDetails, browser: Browser):
print(params)
page = browser.get_current_page()
page.go_to(params.job_link)
3. Use of advanced functions
3.1 Parallelization agents
It is recommended to use a single Browser instance and parallelize the context for each agent:
browser = Browser()
for i in range(10): async with browser.new_context() as context
async with browser.new_context() as context.
agent = Agent(
task=f "Task {i}",
task=f "Task {i}", llm=model, browser_context=context
browser_context=context
)
# Processing tasks...
3.2 Browser Configuration
Browser behavior can be configured through the BrowserConfig and BrowserContextConfig classes:
browser_config = BrowserConfig(
headless=False, # Whether or not to use headless mode
keep_open=True, # Keep the browser open after the script is finished
disable_security=True, # Disable security features
cookies_file="cookies.json", # Cookie storage file
minimum_wait_page_load_time=1.0, # Minimum wait for page load time
wait_for_network_idle_page_load_time=2.0, # Network idle wait time
maximum_wait_page_load_time=10.0 # maximum page load wait time
)
4. Performance optimization recommendations
- Use appropriate page load wait times to avoid excessive waits
- Rational use of parallelization features to improve processing efficiency
- Enable headless mode when appropriate to reduce resource usage
- Reduce Duplicate Authentication with Cookie Persistence
- Adjust the configuration of safety features as required
5. Troubleshooting
- If you encounter problems with cross-domain requests, consider enabling the
disable_security
options (as in computer software settings) - Page load timeout with adjustable wait time parameter
- Ensure that the API key is configured correctly
- Check network connection status
- Check the browser console log for detailed error messages