General Introduction
Auto-Audio-Book is an open source project hosted on GitHub. It automatically crawls the content of novels from websites and converts them into audiobooks with multiple character voices. Developer zqq-nuli written in Python 3.10+ , combined with large models (such as the Gemini and CosyVoice2-0.5B) to implement text processing and speech synthesis. The project not only supports basic text-to-audio, but also distinguishes characters in a novel and assigns different voices to generate radio drama-like effects. The code is open to the public, and users are free to download and modify it. As of March 24, 2025, the project is still under development, and the GUI is not completely perfect, but the whole process can be completed through the command line, which is suitable for technology enthusiasts and audiobook producers.
Function List
- novel crawl: Automatically downloads novel chapter contents from designated websites.
- Dialog message generation: Use AI to analyze text and distinguish between characters and dialogue.
- Multi-Character Voiceover: Assign different voices to fictional characters, supporting protagonist, narrator, and randomized dubbing.
- Audio Generation: Convert text to MP3 format for audiobooks with multi-thread acceleration support.
- Management tools: Provides a GUI to assist in the management of novel data and audio files.
- open source and scalable: Users can modify the code to add new features or optimize the effect.
Using Help
Auto-Audio-Book requires a certain technical foundation to install and use. Below is a detailed installation and operation guide to help you generate an audiobook from scratch.
Installation process
- environmental preparation
- Install Python 3.10 or later, downloaded from https://www.python.org/downloads/.
- mounting
ffmpeg
For Windows, download from https://ffmpeg.org/download.html, and for Mac, use thebrew install ffmpeg
The Linux system is used forsudo apt install ffmpeg
The - (Optional) Install MongoDB for GUI management of novel data, download from https://www.mongodb.com/try/download/community.
- To check the environment: at the command line type
python --version
cap (a poem)ffmpeg -version
Make sure the version is displayed correctly.
- Download Code
- Clone the project locally with Git:
git clone https://github.com/zqq-nuli/auto-audio-book.git
- Go to the project catalog:
cd auto-audio-book
- Clone the project locally with Git:
- Creating a Virtual Environment
- utilization
uv
Create a virtual environment (requires prior installation)uv
usepip install uv
):uv venv --python 3.10
- Activate the environment:
- Windows:
.\.venv\Scripts\activate
- Mac/Linux:
source .venv/bin/activate
- Windows:
- utilization
- Installation of dependencies
- Install the required libraries in the virtual environment:
uv add -r requirements.txt
- If there is a lack of
requirements.txt
, the core library can be installed manually:pip install requests gTTS PyPDF2 pymongo
- Install the required libraries in the virtual environment:
- Configuring the API Key
- make a copy of
.env.example
file is.env
::copy .env.example .env # Windows cp .env.example .env # Mac/Linux
- compiler
.env
file, fill in the Big Model API Key, such as the key for Gemini, which can be requested from the corresponding platform.
- make a copy of
Procedure for use
- crawl a novel
- Select a fiction site (e.g. https://m.ilwxs.com/), the program supports unprotected sites by default.
- Run the crawl script:
python app/getBookList.py
- Then get the list of chapters and save the content:
python app/getZjList.py python app/saveBooks.py
- Generating dialog messages
- Process chapters with AI to differentiate between characters and dialog:
python app/saveBookJson.py
- The output is saved as a JSON file for subsequent dubbing.
- Process chapters with AI to differentiate between characters and dialog:
- Configuring character voices
- Run the script to create the role table:
python app/createUser.py
- Manually assign voices for main character and narrator (supports models like CosyVoice2-0.5B). Other characters can be assigned randomly:
- Individual voices for characters with more than 50 lines.
- Less than 50 sentences in narrator's voice.
- Run the script to create the role table:
- Generate Audio
- Run the audio generation script:
python app/createAudio.py
- Supports multi-threaded acceleration, e.g. 20 threads:
python app/createAudio.py --threads 20
- The output is an MP3 file, stored in the project directory.
- Run the audio generation script:
- Manage Audio (optional)
- Sorting audio with GUI tools:
python gui/gui.py
- Or batch delete Himalayan entries:
python gui/gui2.py
- Sorting audio with GUI tools:
Operation Note
- Efficiency Optimization: One computer can process 300 chapters a night in a single thread. Tests show that 5 machines with 20 threads each can generate 2000 chapters in 5 hours.
- error detection: If you missed a chapter, check the network or re-run the corresponding chapter script.
- Model constraints: The silicon-based model is IP-restricted and requires server hijacking for multi-computer parallelism.
Example Process
Suppose you're converting a novel:
- Crawl https://m.ilwxs.com/'s novel So-and-So and save chapters.
- Generate dialog messages, recognizing protagonist A and the narrator.
- Configuration A with male Chinese voice, narrator with female voice, others randomized.
- Run multithreaded generation to get
chapter1.mp3
etc.
Once completed, it can be uploaded to platforms such as Himalaya, and an example of the finished product can be found at https://www.ximalaya.com/album/88023000.
application scenario
- Audiobook production
Turn web novels into multi-character audiobooks and upload platforms to share or monetize. - learning experiment
Tech enthusiasts use it to learn crawling, AI and audio processing techniques. - Personal Entertainment
Turn your favorite novels into audio and listen to them anytime, anywhere.
QA
- What big models are supported?
Currently support Gemini and CosyVoice2-0.5B, need to apply for API Key by yourself. - Why are some chapters not generated?
It could be a network outage or crawl failure, check the logs and rerun the corresponding chapter. - How can I improve audio quality?
The default model has limited effect, it can be replaced with other TTS engine, which requires code modification.