SongGeneration - Music generation model launched by Tencent AI Labs

Latest AI Resources9mos agorelease AI Sharing Circle

61.5K 00

What is SongGeneration?

SongGeneration is a high-quality song generation project launched by Tencent AI Lab. Based on the LeLM (Large Language Model for Music) framework, SongGeneration is able to generate vocals and accompaniment in parallel, realizing the harmony and unity of the two. Users can guide the generation process by inputting lyrics, descriptive text (e.g., style, emotion, etc.), or reference audio. songGeneration supports a wide range of musical styles and emotional expressions, and generates songs with high quality and diversity. The technical architecture combines hybrid and two-track tagging, and the generated tags are reconstructed into audio by music codecs. It is suitable for music creation, movie and TV soundtracks, game music, etc., providing an efficient and creative solution for creators.

SongGeneration's main features

Co-generation of vocals and backing tracksSongGeneration generates both vocals and backing tracks simultaneously, ensuring a high degree of rhythmic, melodic and emotional unity between the two. With mixed tokens and double-track tokens, vocals and backing vocals blend naturally, avoiding the separation of vocals and backing vocals that occurs in traditional generation methods.
Multi-style and multi-emotion support: Users can specify the style (e.g. pop, rock, jazz, etc.) and emotion (e.g. upbeat, sad, emotional, etc.) of the song through the description text. songGeneration can generate songs that meet the requirements based on these descriptions to satisfy different scenarios and user needs.
Multi-track generation: SongGeneration automatically generates separate vocal and backing tracks while ensuring a high degree of melodic, structural, rhythmic and orchestral matching.
Flexible input methods: The user can enter lyrics (which need to be labeled with a structure such as [Verse],[Chorus] etc.), description text or reference audio to guide the generation. The variety of input methods provides great convenience, even for non-professional users.
High quality music output: SongGeneration generates songs with high-quality audio performance that outperforms open-source music generation models and rivals top industry systems. The generated songs can be directly used in music composition, movie and TV soundtracks, game music and other scenarios.
Efficient generation of capacity: SongGeneration is based on the highly efficient LeLM framework, which can quickly generate complete songs, greatly improving the efficiency of creation and lowering the threshold of creation, making music creation easier and more efficient.

SongGeneration Project Address

GitHub repository:: https://github.com/tencent-ailab/SongGeneration
HuggingFace Model Library:: https://huggingface.co/tencent/SongGeneration
arXiv Technical Paper:: https://arxiv.org/pdf/2506.07520
Online Experience Demo:: https://huggingface.co/spaces/tencent/SongGeneration

How to use

Online Experience: The SongGeneration model is now available on Hugging Face and can be used by users through an online experience.
How to use the function
- text controlSongGeneration's music is a unique and powerful tool that allows users to create high-quality full-length music compositions based on keyword text (e.g., "happy pop", "fierce rock").
- follow the styleSongGeneration is the first and only way to create a new full-length song in the same style, covering a wide range of genres such as pop, rock, Chinese and more: Users can upload their own reference audio of 10 seconds or more, and SongGeneration will automatically generate a new full-length song in the same style.
- Multi-track generation: SongGeneration automatically generates separate vocal and backing tracks while ensuring a high degree of melodic, structural, rhythmic and orchestral matching.
- tone followingSongGeneration: SongGeneration supports tone following based on reference audio, generating songs with "tone clone" level vocal performance that sounds natural and emotional.
local use: If you need to use SongGeneration locally, you can get the code and models from the Github repository and the Hugging Face model repository. Users can download the code and model weights, install and configure them according to the instructions in the project documentation, and then run it locally to generate music.

SongGeneration's technological advantages

Low bit rate music encoding and decoding: SongGeneration innovatively realizes high-quality music reconstruction at very low bit rates (25Hz) and ultra-low bit rates (0.35kbps), efficiently compressing and restoring 48kHz two-channel music.
Multiple preference alignment: SongGeneration can accurately align multiple dimensions such as musicality, lyrics alignment, cue consistency, etc. through Direct Preference Optimization (DPO) and Multi-Dimensional Preference Alignment. The generated songs are not only excellent in sound quality, but also closer to users' needs in terms of melody, structure and emotional expression.
Parallel Prediction of Multi-Category TokenSongGeneration has adopted a "hybrid first, dual-track second" strategy that avoids the need for a differentiated approach. Token Interference between types.
Three-stage training paradigm: SongGeneration uses a three-phase training paradigm of pre-training, modular extension training, and multi-preference alignment training.
High performance and competitivenessSongGeneration's performance was excellent in a number of key dimensions, including content appreciation, content utility, and production quality, in comparison with commercial and open-source models. The generated songs show strong competitiveness in terms of sound quality, melody, structure and emotional expression.

Who SongGeneration is for

music creator: Professional musicians or amateurs, SongGeneration can provide powerful creative assistance. It can help creators quickly generate high-quality songs, inspire and save creative time in melody creation, arrangement and lyrics matching. Creators can input lyrics or descriptions according to their own creativity, and generate complete songs that match the style and emotional needs.
moviemakerSongGeneration can quickly generate matching music according to the emotional atmosphere and style requirements of a movie or TV production. For example, SongGeneration can generate suitable background music for movies, TV dramas, advertisements or short videos to enhance the overall effect of the works.
game developerGame music needs to be closely integrated with the game scene and atmosphere. SongGeneration can generate music that matches the game scene according to the style of the game (e.g. fantasy, sci-fi, adventure, etc.) and the emotional needs (e.g. tense, joyful, mysterious, etc.) to enhance the player's sense of immersion.
content creatorSongGeneration can generate music quickly according to the style and emotional needs of the content, avoiding copyright issues while enhancing the appeal of the content.
Music educators and students: SongGeneration can be used as a music education tool to help students understand different musical styles, emotional expressions and compositional techniques. Educators can use it to generate sample music to demonstrate the effects of different styles and emotions, and to stimulate students' interest in learning.