General Introduction
PDF2Audio is an open source project designed to convert PDF files into audio content such as podcasts, lectures and summaries. The tool leverages OpenAI's GPT model for text generation and text-to-speech conversion. Users can upload multiple PDF files, select different instruction templates (e.g., podcasts, lectures, summaries, etc.), and customize the text generation and audio model. PDF2Audio provides a variety of speech options and allows users to iteratively improve the audio content by editing drafts and providing feedback.
Recommended Related Items:NotebookLM: Knowledge Notes Retrieval Reading, Multi-Class Document Generation Voice Dialog Podcasts
Function List
- Upload multiple PDF files
- Select different instruction templates (podcasts, lectures, summaries, etc.)
- Custom text generation and audio modeling
- Select a different voice
- Iteratively improve audio content by editing drafts and providing feedback
- Support for local installation and use
PDF2Audio Interface
PDF2Audio's interface is very simple, the steps are as follows:
1. Upload one or more PDF files
2. Select the desired instruction template
3. Customize instruction templates if needed
4. Click the "Generate Audio" button to create the audio content.
Using Help
Online Experience
https://huggingface.co/spaces/lamm-mit/PDF2Audio
https://colab.research.google.com/github/lamm-mit/PDF2Audio/blob/main/PDF2Audio.ipynb
Local Installation Process
- clone warehouse: Run the following command in a terminal to clone the PDF2Audio repository:
git clone https://github.com/lamm-mit/PDF2Audio.git cd PDF2Audio
- Installing Miniconda: If Miniconda is not already installed, download the installer from the Miniconda website and follow the installation instructions for your operating system. Verify that the installation was successful:
conda --version
- Creating a Conda Environment: Create a new Conda environment by running the following command in a terminal:
conda create -n pdf2audio python=3.9 conda activate pdf2audio
- Installing dependencies: Run the following command in a terminal to install the required dependencies:
pip install -r requirements.txt
- Setting the OpenAI API Key: Create a
.env
file and add your OpenAI API key:OPENAI_API_KEY=your_api_key_here
Usage Process
- Running the application: Make sure you are in the project directory and that the Conda environment is activated:
conda activate pdf2audio python app.py
- Open your browser.: A URL will be provided in the terminal, usually the
http://localhost:7860
If the URL is open in a browser, the URL will be opened in the browser. - Upload PDF files: Upload one or more PDF files using the Gradio interface.
- Selecting a Command Template: Select the instruction template you want (e.g., podcast, lecture, summary, etc.).
- customizable command: Customize the instructions as needed.
- Generate Audio: Click the "Generate Audio" button to create your audio content.
caveat
- The app requires an OpenAI API key to run.
- You can iteratively improve audio content by editing drafts and providing specific or general feedback.