AI Personal Learning
and practical guidance
豆包Marscode1

Tool to automatically crawl novels and generate multi-character audiobooks

General Introduction

Auto-Audio-Book is an open source project hosted on GitHub. It automatically crawls the content of novels from websites and converts them into audiobooks with multiple character voices. Developer zqq-nuli written in Python 3.10+ , combined with large models (such as the Gemini and CosyVoice2-0.5B) to implement text processing and speech synthesis. The project not only supports basic text-to-audio, but also distinguishes characters in a novel and assigns different voices to generate radio drama-like effects. The code is open to the public, and users are free to download and modify it. As of March 24, 2025, the project is still under development, and the GUI is not completely perfect, but the whole process can be completed through the command line, which is suitable for technology enthusiasts and audiobook producers.

 

Function List

  • novel crawl: Automatically downloads novel chapter contents from designated websites.
  • Dialog message generation: Use AI to analyze text and distinguish between characters and dialogue.
  • Multi-Character Voiceover: Assign different voices to fictional characters, supporting protagonist, narrator, and randomized dubbing.
  • Audio Generation: Convert text to MP3 format for audiobooks with multi-thread acceleration support.
  • Management tools: Provides a GUI to assist in the management of novel data and audio files.
  • open source and scalable: Users can modify the code to add new features or optimize the effect.

 

Using Help

Auto-Audio-Book requires a certain technical foundation to install and use. Below is a detailed installation and operation guide to help you generate an audiobook from scratch.

Installation process

  1. environmental preparation
    • Install Python 3.10 or later, downloaded from https://www.python.org/downloads/.
    • mounting ffmpegFor Windows, download from https://ffmpeg.org/download.html, and for Mac, use the brew install ffmpegThe Linux system is used for sudo apt install ffmpegThe
    • (Optional) Install MongoDB for GUI management of novel data, download from https://www.mongodb.com/try/download/community.
    • To check the environment: at the command line type python --version cap (a poem) ffmpeg -versionMake sure the version is displayed correctly.
  2. Download Code
    • Clone the project locally with Git:
      git clone https://github.com/zqq-nuli/auto-audio-book.git
      
    • Go to the project catalog:
      cd auto-audio-book
      
  3. Creating a Virtual Environment
    • utilization uv Create a virtual environment (requires prior installation) uvuse pip install uv):
      uv venv --python 3.10
      
    • Activate the environment:
      • Windows:.\.venv\Scripts\activate
      • Mac/Linux:source .venv/bin/activate
  4. Installation of dependencies
    • Install the required libraries in the virtual environment:
      uv add -r requirements.txt
      
    • If there is a lack of requirements.txt, the core library can be installed manually:
      pip install requests gTTS PyPDF2 pymongo
      
  5. Configuring the API Key
    • make a copy of .env.example file is .env::
      copy .env.example .env  # Windows
      cp .env.example .env    # Mac/Linux
      
    • compiler .env file, fill in the Big Model API Key, such as the key for Gemini, which can be requested from the corresponding platform.

Procedure for use

  1. crawl a novel
    • Select a fiction site (e.g. https://m.ilwxs.com/), the program supports unprotected sites by default.
    • Run the crawl script:
      python app/getBookList.py
      
    • Then get the list of chapters and save the content:
      python app/getZjList.py
      python app/saveBooks.py
      
  2. Generating dialog messages
    • Process chapters with AI to differentiate between characters and dialog:
      python app/saveBookJson.py
      
    • The output is saved as a JSON file for subsequent dubbing.
  3. Configuring character voices
    • Run the script to create the role table:
      python app/createUser.py
      
    • Manually assign voices for main character and narrator (supports models like CosyVoice2-0.5B). Other characters can be assigned randomly:
      • Individual voices for characters with more than 50 lines.
      • Less than 50 sentences in narrator's voice.
  4. Generate Audio
    • Run the audio generation script:
      python app/createAudio.py
      
    • Supports multi-threaded acceleration, e.g. 20 threads:
      python app/createAudio.py --threads 20
      
    • The output is an MP3 file, stored in the project directory.
  5. Manage Audio (optional)
    • Sorting audio with GUI tools:
      python gui/gui.py
      
    • Or batch delete Himalayan entries:
      python gui/gui2.py
      

Operation Note

  • Efficiency Optimization: One computer can process 300 chapters a night in a single thread. Tests show that 5 machines with 20 threads each can generate 2000 chapters in 5 hours.
  • error detection: If you missed a chapter, check the network or re-run the corresponding chapter script.
  • Model constraints: The silicon-based model is IP-restricted and requires server hijacking for multi-computer parallelism.

Example Process

Suppose you're converting a novel:

  1. Crawl https://m.ilwxs.com/'s novel So-and-So and save chapters.
  2. Generate dialog messages, recognizing protagonist A and the narrator.
  3. Configuration A with male Chinese voice, narrator with female voice, others randomized.
  4. Run multithreaded generation to get chapter1.mp3 etc.

Once completed, it can be uploaded to platforms such as Himalaya, and an example of the finished product can be found at https://www.ximalaya.com/album/88023000.


 

application scenario

  1. Audiobook production
    Turn web novels into multi-character audiobooks and upload platforms to share or monetize.
  2. learning experiment
    Tech enthusiasts use it to learn crawling, AI and audio processing techniques.
  3. Personal Entertainment
    Turn your favorite novels into audio and listen to them anytime, anywhere.

 

QA

  1. What big models are supported?
    Currently support Gemini and CosyVoice2-0.5B, need to apply for API Key by yourself.
  2. Why are some chapters not generated?
    It could be a network outage or crawl failure, check the logs and rerun the corresponding chapter.
  3. How can I improve audio quality?
    The default model has limited effect, it can be replaced with other TTS engine, which requires code modification.
May not be reproduced without permission:Chief AI Sharing Circle " Tool to automatically crawl novels and generate multi-character audiobooks
en_USEnglish