General Introduction
Easy-Voice-Toolkit is a multifunctional toolkit based on the Open Source Speech Project that provides a wide range of automated audio tools for speech recognition, speech transcription, speech conversion, dataset creation and model training. Users can selectively use these tools as needed, or use them sequentially to gradually convert raw audio files into ideal speech models. The toolkit supports local deployment, and users can download a lightweight installation package or a portable package for use.
Function List
- audio processing
- speech recognition
- voice transcription
- Data set creation (SRT conversion & WAV splitting)
- model training
- speech synthesis
Using Help
Installation Process:
- Download Lightweight Installer: Small packages that contain installation instructions, but do not contain the necessary environment dependencies and models.
- Download Ready-to-Use Carrying Case: Large package with all environment dependencies and multiple model presets, download and unzip for use.
Local Deployment - User Installation:
- Download the lightweight installer or ready-to-use portable package.
- Unzip the downloaded file.
- (of a computer) run
.exe
file or its shortcut.
Local Deployment - Developer setup environment:
- Make sure Python 3.8 or later is installed.
- Cloning Project Warehouse:
git clone https://github.com/Spr-Aachen/Easy-Voice-Toolkit.git
- Switch to the project directory:
cd Easy-Voice-Toolkit
- Install dependencies:
pip install -r requirements.txt
- Install the GUI dependencies:
pip install pyside6 QEasyWidgets pywin32==300 psutil pynvml darkdetect PyGithub
- Run the program:
python Run.py
Functional operation flow:
- audio processing: Import the audio file, select the desired processing tools (e.g. noise reduction, editing, etc.), apply the processing and save the result.
- speech recognition: Import the audio file, select the speech recognition model, run the recognition and export the text results.
- voice transcription: Import the audio file, select the transcription tool, run the transcription and export the subtitle file (e.g. SRT).
- Data set production: Import audio files, select the dataset creation tool, and perform SRT conversion or WAV splitting to generate a training dataset.
- model training: Import the training dataset, select the model training tool, configure the training parameters, run the training and save the model.
- phonetic transcription: Import audio files, select the voice conversion tool, configure the conversion parameters, run the conversion and save the result.
caveat
- Currently, the UI interface only supports Windows systems.
- Please ensure a stable internet connection during download and use.
- If you encounter problems, please refer to the instructions and FAQs in the project repository.