InstanceAssemble - Little Red Book and Fudan University open source layout control generation technology

堆友AI

What is InstanceAssemble

InstanceAssemble is a layout control generation technology jointly open-sourced by Xiaohongshu and Fudan University, which realizes accurate image generation from simple to complex and from sparse to dense layouts through the mechanism of "Instance Assemble Attention". Adopting a two-stage cascade architecture, it first generates the image background, and then integrates the instance information in the layout one by one. Using an independent attention mechanism to avoid interference between different instances, it can effectively handle complex layouts, such as overlapping or small objects.InstanceAssemble performs lightweight adaptation through the LoRA module, which requires only a small number of parameters to be added and does not need to re-train the entire model, greatly reducing computational costs while improving inference speed. Multi-modal inputs are supported, and each instance can be enriched with textual descriptions or image information.

InstanceAssemble - 小红书联合复旦大学开源的布局控制生成技术

Features of InstanceAssemble

  • Precise layout controlThe innovative Instance Assembling Attention mechanism precisely controls the position, shape and semantic attributes of each target object in the image, ensuring that the generated image is highly aligned with the given layout instructions (e.g., bounding boxes, text descriptions), and is especially effective in complex scenes (e.g., high-density multi-instance layouts). It is especially effective in complex scenarios (e.g., high-density multi-instance layout).
  • Cascade Architecture Design: A cascade structure is used to generate the global image background and overall context using the base model first, and then integrate the local instance information one by one through the instance assembly module, taking into account the global quality and local alignment, and avoiding mutual interference between instances.
  • Lightweight AdaptationModel adaptation based on LoRA (Low-Rank Adaptation) technology can realize the layout control function on the basis of existing diffusion models (e.g., Stable Diffusion, Flux, etc.) by adding a small number of parameters (about 3% of the base model) without large-scale retraining, which can take into account both efficiency and compatibility.
  • multimodal support: It supports various modal inputs such as text, reference map, depth map, edge map, etc., which can be flexibly combined with different information to generate images and enrich the content expression.
  • Open Source and Application Potential: Open source code and pre-trained models are available to provide industrial-grade solutions for design, advertising, content creation and other fields, which can be expanded to intelligent typesetting, virtual content generation and other scenarios in the future.

Core Benefits of InstanceAssemble

  • Precise layout control: It generates images precisely according to user-specified positions and contents, maintaining high-precision layout alignment and semantic consistency in both simple screens and complex scenes.
  • Low computational cost: Lightweight adaptation via LoRA requires only a small number of parameters to be added, which reduces the overhead of 97% compared to the traditional approach, and dramatically improves the inference speed.
  • Ability to handle complex layouts: Adopting the independent attention mechanism, the attention computation of each target instance is only carried out in its corresponding image region, which effectively avoids the interference between different instances, and is able to deal with complex layout situations such as overlapping or small objects.
  • Multi-modal input support: Each instance can be specified either by a textual description or by utilizing additional image information (e.g., reference images, depth maps, edge maps, etc.) to enrich the content representation, enhancing the diversity and accuracy of the generated images.

What is InstanceAssemble's official website

  • GitHub repository:: https://github.com/FireRedTeam/InstanceAssemble
  • arXiv Technical Paper:: https://arxiv.org/pdf/2509.16691

Who InstanceAssemble is for

  • Creative Designer: Need to quickly generate images that meet specific layout and creative requirements for use in advertising design, poster production, UI/UX design, and more.
  • e-commerce practitioner: Used to generate high-quality product display images to enhance the attractiveness and user experience of product pages.
  • game developer: In game scene design and character generation, quickly realize image generation for complex layouts and improve development efficiency.
  • content creator: e.g. bloggers, self-publishers, etc., for generating personalized graphic content to enhance the attractiveness and professionalism of the content.
  • research worker: Conducting research in the fields of artificial intelligence and computer vision to explore more possibilities of layout control generation techniques.
  • Corporate Marketing Team: Used to create marketing materials such as social media images, promotional posters, etc. to meet diverse marketing needs.
© Copyright notes

Related posts

No comments

You must be logged in to leave a comment!
Login immediately
none
No comments...