General Introduction
AI-reads-books-page-by-page is an intelligent PDF book analysis tool developed based on Python, which can automate the page-by-page analysis of PDF books, extract key knowledge points, and generate stage-by-stage summaries after a specified page interval. The project uses AI technology to achieve intelligent content understanding and summary generation, which can help users quickly grasp the core content of the book. The system is equipped with intelligent filtering function, which can automatically skip the table of contents and index pages, and also supports breakpoint continuation, so that the processing can be continued from the position of the last analysis. The project output adopts Markdown format, which is easy to read and share, and supports persistent storage of knowledge base to ensure that the analysis results will not be lost.
Function List
- Automation PDF Book Analysis and Knowledge Extraction
- AI-driven content understanding and summary generation
- Interval-based milestone progress summaries
- Persistent Knowledge Base Repository System
- Summary output in Markdown format
- Color Terminal Output for Improved Visibility
- Supports breakpoint reading of existing knowledge bases
- Configurable analysis intervals and test modes
- Intelligent content filtering (automatically skips table of contents, index pages, etc.)
- Standardized output directory structure management
- JSON format knowledge base storage
- Support for custom AI model selection
Using Help
1. Environmental preparation
- First make sure that the Python environment is installed on the system
- Cloning projects to local:
git clone https://github.com/echohive42/AI-reads-books-page-by-page cd AI-reads-books-page-by-page
- Install the dependency packages:
pip install -r requirements.txt
2. Basic configuration
The following key parameters need to be configured before use:
- Place the PDF file to be analyzed in the project root directory.
- show (a ticket)
read_books.py
file, modify the following configuration:PDF_NAME
: Set the name of the PDF file as yourANALYSIS_INTERVAL
: Setting the analysis interval (number of pages)TEST_PAGES
: Setting the number of test pages (optional)MODEL
: Selection of AI models for processing pagesANALYSIS_MODEL
: Selection of AI models for generative analysis
3. Description of the directory structure
The program automatically creates the following directory structure:
book_analysis/knowledge_bases/
: Storing knowledge base files in JSON formatbook_analysis/summaries/
: Store summary files in Markdown formatbook_analysis/pdfs/
: Store copies of PDF files
4. Running the program
python read_books.py
5. Description of the use of advanced functions
- Interval analysis control
- set up
ANALYSIS_INTERVAL = None
Summary of closable intervals - Setting a specific value (e.g. 20) generates a summary for every 20 pages processed
- set up
- test pattern
- set up
TEST_PAGES = None
Handling of entire books - Setting a specific number of pages allows for partial testing
- set up
- resume reading after a break
- The program automatically saves the processing progress
- When restarting the program, it will continue from the last processed location.
- Output file management
- Knowledge points are stored in JSON files
- The summary document is in Markdown format
- File names include timestamps for versioning
- Custom Analytics
- Adjustable AI model parameters
- Support for configuring the depth and manner of analysis
- Customizable output format and storage location
6. Cautions
- Ensure PDF files are formatted correctly to avoid encryption or corruption
- Small-scale testing is recommended when working with large PDFs.
- Regular backup of knowledge base documents
- Adjustment of analysis intervals to actual needs
- Monitoring system resource utilization