After OpenAI's Deep Research tool came out of nowhere, all the major vendors launched their own Deep Research tools. The so-called Deep Research is compared with ordinary search, where a simple RAG retrieval generates generally only one round of retrieval. However, Deep Research can be like a human being, based on a topic to continuously search, analyze, and then search, and then analyze, until it reaches the purpose of the research. From this point of view, it is essentially an upgraded version of the RAG application, the use of ReAct/Plan And Solve and other modes of construction of the pendant domain Agent, with the article decomposition planning and generation, information acquisition and analysis capabilities.
In principle it is very simple, but want to achieve a private to meet their business needs of the finished product, the actual engineering details as well as the effect of optimization is quite complex, therefore, some scaffolding of the project or finished product development platform is particularly important, which is the same as the RAG, there will be more and more such development frameworks appear.
Today, on the introduction of several Deep Research open source implementation , on behalf of the two realization ideas , one is based on the existing orchestration framework implementation , such as Langchain Langgraph, the other is specifically designed for the characteristics of deep research development . Through them not only can quickly build deep research applications, but also to understand the details of the implementation of these frameworks and specific selection, such as what to search with, what to store, what is the prompt word , etc., which is very useful for our own realization of the reference role.
1. Langchain Open DeepResearch
It is the official demo implementation of LangChain, based on the LangGraph Build the entire processing flow. Search and information gathering is enabled by integrating multiple APIs such as Tavily , Perplexity. Users can set the depth of search for each chapter, including the number of iterations for writing, reflecting, searching, and rewriting, as well as provide feedback on the plan for the report chapter and iterate until satisfied.
Prompt used: https://github.com/langchain-ai/open_deep_research/blob/main/src/open_deep_research/prompts.py
Project address: https://github.com/langchain-ai/open_deep_research同类型的有Dify等框架编排的Deep Research application.
2. Open Deep Research
Open Deep Research is one of many plumbing implementations. It deconstructs the DeepSearch process and supports automatic and semi-automatic Research process. Supporting a variety of API interfaces, it is not only able to retrieve information from the extranet, but also to retrieve internal enterprise information for summary analysis. Users can choose different AI platforms according to their needs, including Google, OpenAI, Anthropic, DeepSeek, etc., and can even access local models to achieve personalized research.
It contains the three steps of the Deep ReSearch standard:
- Search Results Retrieval: Get comprehensive search results for specified search terms via Google Custom Search or Bing Search API (configurable).
- Content Extraction: Utilizing JinaAI to extract and process content from selected search results to ensure accuracy and relevance of information.
- Report generation: using user-selected AI models (e.g. Gemini (GPT-4, Sonnet, etc.) generates detailed reports on the collated search results and extracted content, providing in-depth analysis and insights on user-defined prompts.
Below is the Prompt used to generate the report:
You are a research assistant tasked with creating a comprehensive report based on multiple sources. The report should specifically address this request: "${userPrompt}" The report should specifically address this request: "${userPrompt}". 1. Have a clear title that reflects the specific analysis requested 2. Begin with a concise executive summary 3. Be organized into relevant sections based on the analysis requested 4. Use markdown formatting for emphasis, lists, and structure 5. Integrate information from sources naturally without explicitly referencing them by number 6. Maintain objectivity while addressing the specific aspects requested in the prompt 7. Compare and contrast the information from each source, noting areas of consensus Showcase key insights, important data, or innovative ideas. 8. Showcase key insights, important data, or innovative ideas. Here are the source articles to analyze: ${articles ${articles .map( (article) => ` Title: ${article.title} URL: ${article.url} Content: ${article.content} --- ` ) .join('n')} Format the report as a JSON object with the following structure. { "summary": "Executive summary (can include markdown)", "sections": ["summary": ['n' } "sections": [ { "content": "Section content with markdown formatting" } ] } Use markdown formatting in the content to improve readability. - Use **bold** for emphasis - Use bullet points and numbered lists where appropriate. - Use headings and subheadings with # syntax - Include code blocks if relevant - Use > for quotations - Use --- for horizontal rules where appropriate Important: Do not use phrases like "Source 1" or "According to Source 2". Do not use phrases like "Source 1" or "According to Source 2". Instead, integrate the information naturally into the narrative or reference sources by their titles when necessary.
The generated report can be downloaded or stored in the knowledge base, but it has insufficient high quality search sources and lacks Research validation and iterative process, so there is still room for improvement in quality, but the overall process is clear and well suited for continuous improvement and refinement on this basis.
Project address: https://github.com/btahir/open-deep-research
The same type is also available:
https://github.com/nickscamara/open-deep-research (4.3k)
https://github.com/mshumer/OpenDeepResearcher (2.2k)
https://github.com/assafelovic/gpt-researcher (19k)
https://github.com/zaidmukaddam/scira (6.4k)
https://github.com/jina-ai/node-DeepResearch (2.6k)
Among them, node-DeepResearch for jina's open source deep research implementation, you can directly use its api, and other model interfaces are as simple to use, you can quickly integrate into their own applications.
wrap-up
As mentioned at the beginning of the article, Deep Research is the result of the evolution of the user's demand for high-quality access to content, breaking the information cocoon of passive recommendation, abandoning the traditional search and summary, and then search and summarize the inefficient process, well through automation. According to this direction of development, the mode of content acquisition will have new changes, which will be a great challenge for the traditional search recommendation.