Kaleido - A multi-subject reference video generation model open-sourced by Smart Spectrum AI in collaboration with Tsinghua University and others

堆友AI

What's Kaleido?

Kaleido is an open source multi-subject reference video generation model jointly developed by Hefei University of Technology, Tsinghua University and Smart Spectrum AI. It generates subject-consistent videos from multiple reference images, addressing the shortcomings of existing models in multi-subject consistency and background decoupling.Kaleido generates high-quality training data through a specialized data construction pipeline, including low-quality sample filtering and diverse data synthesis. Its innovative Reference Rotation Position Encoding (R-RoPE) mechanism stably and accurately integrates multiple reference images to maintain consistency in multi-subject scenarios.Kaleido performs well in multiple benchmarks and significantly outperforms previous approaches in terms of consistency, fidelity, and generalization ability.

Kaleido - 智谱AI联合清华大学等开源的多主体参考视频生成模型

Features of Kaleido

  • Data Builds Pipeline Innovation: A multi-stage scalable S2V data construction pipeline is used, including video slicing/captioning, subject localization, quality filtering, background decoupling, and pose-motion enhancement steps, which effectively improves the diversity and quality of the data and provides high-quality samples for model training.
  • R-RoPE mechanism: Reference Rotational Position Encoding (R-RoPE) is introduced to assign unique rotational position encoding to reference images, realizing stable integration of multi-reference images, significantly improving consistency in multi-subject scenes, and avoiding subject confusion.
  • superior performance: In several benchmark tests, Kaleido significantly outperforms existing methods in terms of subject consistency, background decoupling, and video quality, and performs well, especially approaching the level of closed-source models in terms of aesthetic quality and video smoothness.

Kaleido's core strengths

  • Data diversity and quality: Achieve low-quality sample filtering and diverse data synthesis through a multi-stage data construction pipeline to ensure the richness and high fidelity of the training data, laying the foundation for model performance improvement.
  • Multi-subject coherence: The innovative R-RoPE mechanism effectively integrates multiple reference images, significantly improves consistency in multi-subject scenes, avoids subject confusion, and generates high-quality multi-subject videos.
  • Background decoupling capabilities: Outperforms in background decoupling, clearly separating the subject from the background, avoiding background pollution, and enhancing the naturalness and realism of video generation.
  • Excellent performance: In several benchmark tests, Kaleido significantly outperforms existing methods in terms of subject consistency, background decoupling, video quality, aesthetic quality, and video smoothness, approaching or even surpassing the level of closed-source models.
  • Open source drives ecology: As an open source project, Kaleido provides strong support for research and application in the field of video generation, and promotes the technological development and ecological construction of the whole field, with a wide range of application prospects.

What is the official website of Kaleido

  • Project website:: https://criliasmiller.github.io/Kaleido_Project/
  • GitHub repository:: https://github.com/zai-org/Kaleido
  • HuggingFace Model Library:: https://huggingface.co/zai-org/Kaleido-14B-S2V
  • arXiv Technical Paper:: https://arxiv.org/pdf/2510.18573

Who Kaleido is for

  • Video content creators: Generate high-quality videos quickly with Kaleido to save shooting and post-production costs, suitable for content creation in advertising, e-commerce, film and television.
  • Artificial intelligence researchers: As an open-source model, Kaleido provides researchers with rich experimental data and an advanced technical framework to facilitate research work related to video generation.
  • Developers & Engineers: You can integrate Kaleido into your own projects, develop new applications or optimize existing systems for software and platform development that requires video generation capabilities.
  • Creative DesignerThe Kaleido Multi-subject Video Generation capability allows you to quickly realize your creative ideas and provide new ideas and materials for your design work.
  • Educators and students: Can be used for teaching and learning to help students understand the principles and applications of video generation technology and develop relevant skills and creative abilities.
© Copyright notes

Related articles

No comments

You must be logged in to leave a comment!
Login immediately
none
No comments...