General Introduction
RapBank is a dataset and toolset designed for rap lyrics generation. Created by NZqian, the project aims to provide researchers and developers with a high-quality rap lyrics dataset by collecting and processing rap songs from YouTube.RapBank contains over 90,000 rap songs in 84 languages, and provides detailed processing pipelines and usage instructions to help users efficiently process data and train models. The project's data and code are open source on GitHub under the CC BY-NC-SA 4.0 license.
Function List
- Dataset Download: A dataset of over 90,000 rap songs in multiple languages is available.
- Data Processing Pipeline: Includes steps such as source separation, segmentation, and lyrics recognition to help users process data efficiently.
- Detailed documentation: provide complete instructions and sample code to help users get started quickly.
- Open source code: All code and data are open source on GitHub, which is convenient for secondary development.
- License Agreement: The data and code follow the CC BY-NC-SA 4.0 license agreement, ensuring that users stay within the bounds of legality.
Using Help
Installation process
- Cloning Project Warehouse:
git clone https://github.com/NZqian/RapBank.git
cd RapBank
- Install the dependencies:
pip install -r requirements.txt
- Download the dataset and place it in the specified folder, for example
/path/to/your/data/wav
The
data processing
- Use the provided scripts to process the data:
bash pipeline.sh /path/to/your/data /path/to/save/features start_stage stop_stage
start_stage
cap (a poem)stop_stage
parameters are used to specify the start and end phases of processing, ranging from 0 to 5.- Multiple GPUs are recommended for faster processing.
Functional operation flow
- Dataset Download: Visit the GitHub page to download the required dataset files.
- Data processing: Follow the steps above to install the dependencies and run the processing scripts to generate the required feature files.
- Model training: Use the processed data for model training, please refer to the sample code in the project documentation for specific steps.
- Result analysis: rap lyrics generation using the generated model, and analysis and optimization of the results.
Detailed Functions
- Dataset Download: A dataset of more than 90,000 rap songs is available for users to download and use for research and development as needed.
- Data processing pipeline: Includes multiple steps such as source separation, segmentation, and lyric recognition to help users process and analyze data efficiently.
- Detailed Documentation: The project provides complete instructions and sample code to help users quickly get started and secondary development.
- open source: All code and data are open source on GitHub and can be freely downloaded and used by users.
- license: The data and code follow the CC BY-NC-SA 4.0 license agreement, which ensures that users stay within the bounds of legality.