InfinityStar - Byte Open Source Unified Spatio-Temporal Autoregressive Video Generation Framework

Latest AI Resources5mos agorelease AI Sharing Circle

28.3K 00

What is InfinityStar

InfinityStar is a unified spatio-temporal autoregressive framework open-sourced by ByteDance, designed for high-resolution image and video generation. Using a discrete autoregressive approach, it can simultaneously handle text-to-image, text-to-video, and image-to-video tasks in a single model. The framework scored 83.74 points in the VBench benchmark test, outperforming existing autoregressive models and being 10 times faster than diffusion models. Core technologies include spatio-temporal pyramid modeling (decomposing video into first-frame images and dynamic segments), an efficient discrete visual disambiguator (deeply accelerating training through knowledge inheritance and stochastic quantizers), and optimized Transformer architectures (e.g., semantic scale repetition and spatio-temporal sparse attention). Users can experience its capabilities through the Discord community, which supports single-GPU minute-by-minute generation of 5-second 720p videos.

InfinityStar Features

High-resolution video generation: Supports the generation of high-quality 720p video, capable of quickly compositing complex dynamic scenes to meet the demands of high-resolution content.
multitasking support: Covering a wide range of generation tasks such as text-to-image, text-to-video, image-to-video, etc., to meet diversified content creation needs.
Efficient generation of capacityThe generation of a 5-second 720p video takes only 58 seconds, which is much faster than the traditional diffusion model, and significantly improves the generation efficiency.
unified space-time modeling (UTM): Efficient spatial and temporal dependency capturing is realized through a spatio-temporal pyramid structure that effectively decouples appearance and dynamic information.
Knowledge inheritance strategies: A pre-trained Variable Auto-Encoder (VAE) construction based on pre-training utilizing knowledge inheritance strategies to shorten the training time and reduce the consumption of computational resources.
Open Source and Ease of Use: All codes and models have been open-sourced to facilitate researchers and developers to quickly get started and conduct further research and application development.
High-quality generation of results: Excellent performance in VBench benchmark tests, generating high quality video and images with rich details to meet the needs of a wide range of application scenarios.

InfinityStar's Core Benefits

Efficient generation speed: It takes only 58 seconds to generate a 5-second 720p video, which is 10 times faster than the traditional diffusion model, and significantly improves the efficiency of video generation.
High quality output: It scores far better than other models in the VBench benchmark test, generating high quality videos and images with rich details and excellent visual effects.
multitasking support: Naturally supports a wide range of generation tasks such as text-to-image, text-to-video, image-to-video, etc. to meet diverse content creation needs.
unified space-time modeling (UTM): A spatio-temporal pyramid structure is adopted to effectively decouple appearance and dynamic information, realize efficient spatial and temporal dependency capture, and enhance model performance.
Knowledge inheritance strategies: A pre-trained Variable Auto-Encoder (VAE) construction based on pre-training utilizing knowledge inheritance strategies to shorten the training time and reduce the consumption of computational resources.
Open Source and Ease of Use: All codes and models have been open-sourced to facilitate researchers and developers to quickly get started and conduct further research and application development.
Long video generation capability: Paves the way for long video generation, supporting the generation of longer, high-quality video content and expanding the range of applications for video generation.

What is InfinityStar's official website

Github repository:: https://github.com/FoundationVision/InfinityStar
HuggingFace Model Library:: https://huggingface.co/FoundationVision/InfinityStar
arXiv Technical Paper:: https://arxiv.org/pdf/2511.04675

Who is InfinityStar for?

content creator: Including video producers, animators, advertising creatives, etc., it can quickly generate high-quality video content and improve the efficiency of creation.
game developer: Can be used to develop interactive games and Virtual Reality (VR)/Augmented Reality (AR) applications that support interactive video generation to enhance the user experience.
educator: Used to create instructional videos that enhance teaching effectiveness and student engagement by generating animations or videos related to the content.
Social Media Operators: Provide rich and diverse video content for social media platforms to help users quickly generate engaging videos and enhance content distribution.
(scientific) researcher: Conducts research in the areas of computer vision and artificial intelligence to explore new applications and boundary expansion of video generation techniques.
Corporate Marketing Team: Used to create advertisements and promotional videos, quickly generating content that matches the brand's tone, enhancing marketing effectiveness and brand impact.