AI Personal Learning
and practical guidance

wdoc: retrieve content and summarize knowledge from massive, multi-source documents

General Introduction

wdoc is a powerful RAG (Retrieval Augmentation Generation) system designed for processing and analyzing large and diverse documents. It is capable of retrieving from a wide range of document types, including PDFs, web pages, YouTube videos, audio files, etc. wdoc is particularly well suited for processing large amounts of information sources, making it an ideal tool for researchers, students, and professionals needing to work with large amounts of information. The system utilizes the LangChain library for document processing, supports a wide range of LLM (Large Language Model) providers, and offers high-precision retrieval and summarization capabilities. wdoc is still under constant development, and user feedback and feature requests are welcome.

wdoc: retrieving content and summarizing knowledge from massive, multi-source documents-1


 

Function List

  • Multi-file type support: Supports more than 15 file types, including PDFs, web pages, YouTube videos, audio files, and more.
  • High-precision retrieval and summarization: Provides highly accurate document retrieval and summarization through embedded search and semantic batch processing.
  • Multi-LLM Support: Multiple LLM providers are supported, including local models and private models with additional security layers.
  • Advanced RAG Functions: Weak LLM is used to filter irrelevant documents, and strong LLM provides precise answers and merges answers through semantic clustering and sorting.
  • Easy to expand: Not only a tool, but also a library that allows users to use wdoc in other Python projects.
  • Detailed documentation and help: Provide rich documentation and help information to facilitate users to get started quickly.

 

Using Help

mounting

wdoc currently requires Python version 3.11 to run. Please make sure you have the correct version of Python and then follow the steps below to install it:

  1. Use pip to install:
    pip install -U wdoc
  1. Or install a specific git branch:
    pip install git+https://github.com/thiswillbeyourgithub/wdoc@dev
    
  2. It is recommended to install pdftotext and fasttext support:
    pip install -U wdoc[pdftotext] wdoc[fasttext]
    

utilization

  1. Add the required API key as an environment variable:
    export OPENAI_API_KEY="Your API key"
    
  2. Start wdoc:
    wdoc --task=query --path=your document path
    

Functional operation flow

Document Search

Use wdoc to query the contents of a document:

wdoc --task=query --path=your document path --filetype=pdf --query="query content"

The command will load the PDF file from the specified path and retrieve it according to the query and return the relevant documents.

Documentation Summary

Summarize the document using wdoc:

wdoc --task=summarize --path=your document path --filetype=pdf

The command will summarize the specified path to the PDF file to return a detailed summary of the document content.

Combined tasks

You can also combine query and summarization tasks:

wdoc --task=summarize_then_query --path=your document path --filetype=pdf

This command will first summarize the contents of the document and then allow you to make further inquiries about the summary.

Advanced Features

wdoc supports a variety of advanced features such as:

  • Multi-file type support: Load multiple file types via recursive paths, linked files, etc.
  • Advanced RAG Functions: Improve retrieval accuracy using techniques such as multi-query search and semantic batch processing.
  • Local and private LLM support: Ensure that data is secure and not leaked to external providers.
  • Detailed documentation and help: Bywdoc --helpGet more information on how to use it.
May not be reproduced without permission:Chief AI Sharing Circle " wdoc: retrieve content and summarize knowledge from massive, multi-source documents

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish