AI Personal Learning
and practical guidance

Sana: fast generation of high-resolution images, 0.6B ultra-small size model, low-profile laptop GPU operation

General Introduction

Sana is an efficient high-resolution image generation framework developed in NVIDIA Labs that generates images up to 4096 × 4096 resolution in seconds.Sana utilizes a linear diffusion transformer and deep compression auto-encoder technology to dramatically improve the speed and quality of image generation while reducing the need for computational resources. The framework supports running on regular laptop GPUs for low-cost content creation.

Sana: fast generation of high-resolution images, 0.6B ultra-small size model, low-profile laptop GPU running -1


 

Sana: fast generation of high-resolution images, 0.6B ultra-small size model, low-profile laptop GPU running -1

Online experience: https://nv-sana.mit.edu/

 

Function List

  • High resolution image generation: Supports generation of images up to 4096 × 4096 resolution.
  • linear diffusion converter: Improving the efficiency of high-resolution image generation using a linear attention mechanism.
  • Deep Compression Auto-Encoder: Compresses images up to 32x, reducing the number of potential markers and improving training and generation efficiency.
  • Text to Image Conversion: Enhancement of image-to-text alignment through decoder-only text encoder.
  • Efficient training and sampling: Flow-DPM-Solver is used to reduce sampling steps and accelerate convergence.
  • Low-cost deployment: Supports running on 16GB laptop GPUs and generates 1024 × 1024 resolution images in less than 1 second.

 

Using Help

Installation process

  1. Make sure Python version >= 3.10.0, Anaconda or Miniconda is recommended.
  2. Install PyTorch version >= 2.0.1+cu12.1.
  3. Cloning of the Sana warehouse:
    git clone https://github.com/NVlabs/Sana.git
    cd Sana
    
  4. Run the environment setup script:
    . /environment_setup.sh sana
    

    or in accordance with environment_setup.sh Install each component step-by-step as described in

Usage

hardware requirement

  • The 0.6B model requires 9GB of VRAM and the 1.6B model requires 12GB of VRAM. the quantized version will require less than 8GB of video memory for inference.

Quick Start

  1. Launch the official online demo using Gradio:
    DEMO_PORT=15432 \
    python app/app_sana.py \
    --config=configs/sana_config/1024ms/Sana_1600M_img1024.yaml \
    ---model_path=hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth
    
  2. Run the inference code to generate an image:
    import torch
    from app.sana_pipeline import SanaPipeline
    from torchvision.utils import save_image
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    generator = torch.Generator(device=device).manual_seed(42)
    sana = SanaPipeline("configs/sana_config/1024ms/Sana_1600M_img1024.yaml")
    sana.from_pretrained("hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth")
    prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
    image = sana(prompt=prompt, height=1024, width=1024, guidance_scale=5.0, pag_guidance_scale=2.0, num_inference_steps=18, generator= generator)
    save_image(image, 'output/sana.png', nrow=1, normalize=True, value_range=(-1, 1))
    

training model

  1. Prepare the dataset in the following format:
    asset/example_data
    AAA.txt
    AAA.png
    BCC.txt
    ├── BCC.png
    └── CCC.txt
    
  2. Initiate training:
    bash train_scripts/train.sh \
    configs/sana_config/512ms/sana_600M_img512.yaml \
    --data.data_dir="asset/example_data" \
    --data.type=SanaImgDataset \
    ---model.multi_scale=false \
    \ ---train.train_batch_size=32
    
AI Easy Learning

The layman's guide to getting started with AI

Help you learn how to utilize AI tools at a low cost and from a zero base.AI, like office software, is an essential skill for everyone. Mastering AI will give you an edge in your job search and half the effort in your future work and studies.

View Details>
May not be reproduced without permission:Chief AI Sharing Circle " Sana: fast generation of high-resolution images, 0.6B ultra-small size model, low-profile laptop GPU operation

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish