General Introduction
OpenDeepResearcher is an open source automated deep research tool designed to improve research efficiency through artificial intelligence techniques. The project was developed by mshumer and is hosted on GitHub. OpenDeepResearcher leverages a variety of services and technologies, including SERPAPI, Jina, and OpenRouter, to perform Google searches, web content extraction, and contextual analysis. Its core function is to continuously optimize search queries through an iterative research loop until the system is confident that it has gathered all the necessary information. The tool also supports asynchronous processing, iterative filtering, and LLM-driven decision making, ensuring that the research process is efficient and comprehensive.
Function List
- Iterative Research Cycle: The system optimizes search queries through multiple iterations to ensure comprehensive information collection.
- asynchronous processing: Search, web page extraction, evaluation and contextual extraction are performed simultaneously to increase speed.
- duplicate filter: Aggregate and de-duplicate links in each iteration to avoid duplicate processing of the same links.
- LLM Driving Decisions: Generating new search queries, determining page usefulness, extracting relevant context, and generating final reports using a large language model.
- Gradio Interface: Provides a functional user interface that is easy to use.
Using Help
Installation process
- Clone or open the laptop: Download the notebook file or directly at Google Colab Open in.
- Install nestasyncio: Run the first cell to set up nestasyncio.
- Configuring API Keys: Replaces placeholder values in the notebook with the actual API key, including the OPENROUTERAPIKEY, SERPAPIAPIKEY and JINAAPIKEY.
Procedure for use
- Running Notebook Cells: Execute all cells sequentially. The notebook prompts for a research query/topic and an optional maximum number of iterations (default is 10).
- Initial query and search generation: The notebook uses LLM to generate the initial search query.
- Asynchronous search and extraction: Perform SERPAPI searches in parallel, aggregating unique links and processing each link in parallel to determine page usefulness and extract relevant context.
- Iterative optimization: After each round, LLM analyzes the context of the aggregation and decides whether further search queries are needed.
- Generate final report: Once the LLM indicates that further research is no longer required (or the iteration limit is reached), a final report is generated based on all collected contexts.
- View Final Report: The final synthesis report will be printed in the output.
Detailed Operation Procedure
- Input and query generation: The user enters a research topic and LLM generates up to four different search queries.
- Concurrent search and processing: Each search query is sent to SERPAPI at the same time.
- de-emphasize: Aggregate and de-duplicate all retrieved links in the current iteration.
- contextual extraction: Process each link to determine page usefulness and extract relevant context.
- Iterative optimization: Analyze the context of the aggregation and decide if further search queries are needed.
- Final report generation: Generate a final synthesized report based on all collection contexts.