SongBloom - Tencent's open source song generation model with HKCNU and NTU.

堆友AI

What is SongBloom?

SongBloom is an open source song generation model developed by Tencent AI Lab in collaboration with The Chinese University of Hong Kong (Shenzhen) and Nanjing University, which solves the problem of "plasticity" in AI music generation, and realizes high-quality, structurally complete song generation. Only 10 seconds of reference audio and corresponding lyrics can be input to generate a 2 minutes and 30 seconds dual-channel/48kHz high-fidelity complete song, including the intro, main song, chorus, coda and other complete structure. Innovative technology dramatically reduces the "illusion generation" phenomenon of mismatched lyrics and melody, significantly reduces the phoneme error rate, and achieves a new level of lyrics accuracy in the industry. Vocal finesse exceeds that of the top commercial model Suno-V4.5, and musicality is comparable to that of professional composers. For the first time, the autoregressive diffusion model has been introduced into the generation of long-duration songs, combining discrete sketch token and VAE latent technology to take into account the structural consistency and sound quality details.

SongBloom - 腾讯联合港中文、南大开源的歌曲生成模型

Features of SongBloom

  • Efficient generation of capacityThe newest feature is the ability to quickly generate a full song that grows to 2 minutes and 30 seconds with just a 10-second audio sample and corresponding lyrics.
  • High quality audio output: Supports dual-channel, 48kHz high-quality audio generation for clear and professional sound quality.
  • Innovation generation paradigm: Using interleaved generation techniques, combined with autoregressive sketching and diffusion model refinement to optimize song structure and sound quality.
  • Multi-modal input support: Supports both lyrics and audio sample input, accurately fusing multimodal information to generate songs that better fit the needs.
  • Open Source Ease of Use: The project is open source , provide detailed guidelines and a variety of model versions , easy to deploy and use , suitable for different devices to run .
  • Close to SOTA performance: Close to the best in the field in terms of audio quality and lyrics accuracy, outperforming existing open source models.

SongBloom's core strengths

  • Efficient generation of complete songs: Simply enter 10 seconds of reference audio and corresponding lyrics to generate a 2 minutes 30 seconds dual channel/48kHz high fidelity full song with a complete structure of intro, lead, chorus, and outro.
  • Precise Lyrics MatchingThe "illusion generation" phenomenon of mismatch between lyrics and melody has been greatly reduced through innovative technology, and the phonetic error rate has been significantly lowered, resulting in a new level of accuracy in the lyrics industry.
  • Excellent sound quality and musicality: Vocal finesse exceeds that of the top commercial model Suno-V4.5, and musicality rivals that of professional compositions, approaching the best in the field.
  • High quality output: Supports two-channel, 48kHz high-quality audio generation with clear and professional sound quality that is close to the best in the field (SOTA).
  • innovative industries: A staggered generation paradigm, combined with autoregressive sketching and diffusion model refinement, is used to optimize the overall structure and sound quality of the song in a technologically advanced manner.
  • multimodal fusion: Supports both lyrics and audio sample input, accurately fusing multimodal information to generate songs that better fit the needs.

What is SongBloom's official website?

  • Github repository:: https://github.com/tencent-ailab/SongBloom
  • HuggingFace Model Library:: https://huggingface.co/CypressYang/SongBloom
  • arXiv Technical Paper:: https://arxiv.org/pdf/2506.07634
  • Online Experience Demo:: https://cypress-yang.github.io/SongBloom_demo/

Who SongBloom is for

  • music creator: Provides creative inspiration and a framework for rapid song generation for professional musicians and amateurs alike, helping them explore new musical styles and creative directions.
  • audio producer: In the audio production of film and television, games, advertisements and other industries, it is used to quickly generate background music or theme songs to improve the production efficiency.
  • Music educators and students: As a music education tool to help students understand the structure and creative process of music, to stimulate interest in learning, and to assist teachers in teaching.
  • content creator: Provide users with personalized music content on social media, short videos and other platforms to enhance interactivity and fun.
  • Corporate and brand side: Generate customized music for companies and brands for product promotion, event publicity, etc. to enhance brand influence and user engagement.
© Copyright notes

Related articles

No comments

You must be logged in to leave a comment!
Login immediately
none
No comments...