General Introduction
TextDistiller is an advanced AI-driven tool designed to summarize books chapter-by-chapter or as a whole, providing a concise yet comprehensive overview. By using TextDistiller, users are able to quickly grasp the core ideas and key takeaways of any book, thus saving time while maintaining an understanding of the content. The tool utilizes state-of-the-art natural language processing technology to ensure that the summaries generated are both accurate and easy to read for those who need to quickly access and understand information about a book.
Function List
- chapter-by-chapter summary: Provides a detailed summary of each chapter, making it easy for users to focus on the content of a specific chapter.
- Overview of the book: For books without chapter divisions, provide a condensed summary of the overall content.
- natural language processing (NLP): Utilizes state-of-the-art NLP technology to ensure accuracy and readability of summary content.
- user-friendly interface: Simple and intuitive interface design makes the summarization process easy to follow.
Using Help
Installation process
- Cloning Warehouse:
git clone https://github.com/johngai19/TextDistiller.git
- Install the required dependencies:
pip install -r requirements.txt
- Run the command line interface (CLI):
python3 bsCLI.py --path
- Run the Flask server and update the mail configuration:
- update
mail.py
hit the nail on the headsender_address
cap (a poem)sender_pass
The - (of a computer) run
views.py
::python3 views.py
- update
Usage Process
chapter-by-chapter summary
- Pass the path to the book PDF file as a parameter to the command line tool.
- The tool automatically chunks the book by chapter and generates a detailed summary of each chapter.
- Users can view the core content of each chapter and quickly grasp the main ideas of the book.
Overview of the book
- For books that are not divided into chapters, the tool treats the entire book as a whole.
- The generated summary will cover all the important elements of the book, providing a comprehensive overview.
Main Functions
- chapter-by-chapter summary: Run on the command line
python3 bsCLI.py --path
The tool automatically processes and generates a summary of each chapter. - Overview of the book: Also run the above command from the command line, and the tool will automatically select the appropriate treatment based on the structure of the book.
- View Summary: The generated summary will be saved as a text file in the specified directory, which can be opened and viewed directly by the user.
Featured Functions
- natural language processing (NLP) technology: TextDistiller utilizes a T5-small pre-trained model that goes through the steps of chunking, tokenization, summary generation and decoding to ensure that the summaries generated are both accurate and easy to read.
- user-friendly interfaceTextDistiller provides a simple and intuitive interface that makes it easy to get started with both the command line tool and the Flask server.
How TextDistiller works
TextDistiller utilizes the HuggingFace Transformers' T5-small
Pre-train the model to generate accurate and readable summaries. The process includes:
- chunking: Divide the book into chunks, either by chapter or as a whole.
- participle: Use
T5Tokenizer
The chunks are binned to ensure that they are compatible with theT5
Model Compatibility. - Abstract Generation: The text that has been processed through the lexicalization process is passed through the
T5ForConditionalGeneration
The model generates summaries of the Token ID. - decoder: Use
T5Tokenizer
(used form a nominal expression)decode()
function decodes the summarized Token ID into readable text.