SceneGen - Shanghai Jiaotong University open source single image to generate 3D scene framework

Latest AI Resources5mos agorelease AI Sharing Circle

25.9K 00

What is SceneGen?

SceneGen is an open source method for generating 3D scenes from a single image at Shanghai Jiao Tong University. From a single scene image and a target resource mask, a complete scene containing multiple 3D resources is efficiently generated, including the geometric structure of the resources, texture and relative spatial location.Taking a single scene image and the corresponding target resource mask as input, multiple 3D resources are generated simultaneously in a single feed-forward process, each with complete geometric structure, detailed texture, and precise relative spatial position, realizing direct conversion from 2D image to 3D scene.

Features of SceneGen

Single map input co-generationThe ability to simultaneously generate the geometry, texture, and relative spatial locations of multiple 3D assets in a scene based on a single 2D scene image and its corresponding target mask through a single forward propagation process greatly simplifies the complex process of traditional 3D content creation.
Local and global information aggregation: In the feature extraction stage, the unique module can effectively aggregate the local detail information and global context information of the scene, ensuring that the generated 3D assets are not only locally fine, but also maintain a high degree of rationality and consistency with the overall scene layout.
Efficient end-to-end generation: Unlike traditional approaches that rely on time-consuming optimization or multi-step asset retrieval and assembly, an end-to-end generation approach avoids cumbersome intermediate steps and significantly improves the efficiency of generating 3D scenes from concept to usability.
Accurate prediction of spatial relationships: By integrating a position head, the model accurately predicts and arranges the spatial layout of different 3D assets in the scene, ensuring the rationality of spatial relationships between objects, which is crucial for building believable virtual environments.

SceneGen's core strengths

Generating quality: The generated 3D scenes are structurally complete and finely textured, with accurate spatial relationships, and significantly outperform existing methods (e.g., PartCrafter, MIDI, etc.) in terms of geometric accuracy and visual quality on both synthetic and real-world datasets.
Efficiency gains: Multi-resource generation can be done in a single feed-forward without iterative optimization, and it takes about 2 minutes to generate a texturized scene with 4 assets, taking both quality and speed into account.
generalization capability: Although trained only on a single image input, the quality of generation can be further improved with multiple image inputs for better adaptation to complex scenes.

What is the official website of SceneGen

Project website:: https://mengmouxu.github.io/SceneGen/
Github repository:: https://github.com/mengmouxu/scenegen
HuggingFace Model Library:: https://huggingface.co/haoningwu/scenegen
arXiv Technical Paper:: https://arxiv.org/pdf/2508.15769

Who SceneGen is for

Game Developers & Indie ProducersFor independent game developers or small to medium-sized studios with limited resources, SceneGen dramatically reduces the time and financial cost of creating 3D scene art assets. Developers only need to provide concept drawings or reference photos to quickly generate 3D scenes that can be used directly in the game engine, significantly improving development efficiency.
Virtual Reality (VR) and Augmented Reality (AR) Content CreatorsSceneGen's end-to-end generation capabilities are ideally suited for rapid prototyping and production of user-experienced, immersive 3D scenes for VR/AR applications that require the efficient construction of a large number of realistic and interactive virtual environments.
Real Estate and Building Visualization Professionals: Real estate agents, architects and interior designers can use SceneGen to quickly convert their clients' floor plans or on-site photos of interiors into interactive 3D spatial displays that help clients more intuitively understand spatial layouts and design effects.
Film & Animation Pre-Production TeamIn the pre-conceptualization, design, and production phases of film, television, and animation, teams can use SceneGen to accelerate the preproduction process by quickly converting 2D storyboards or set reference drawings into basic 3D layouts for previewing shots, testing compositions, and lighting.
Embodied AI Researchers: The key to training robots, autonomous driving systems, and other intelligences to learn in simulated environments. Researchers need a large number of diverse 3D scenes as training environments, and SceneGen's efficient generation capabilities enable them to quickly build the virtual training worlds they need.