AI Personal Learning
and practical guidance

CogView3: Wisdom Spectrum Light Word open source cascade diffusion text to generate image models

General Introduction

CogView3 is an advanced text generation image system developed by Tsinghua University and Think Tank Team (Chi Spectrum Qingyan). It is based on the cascading diffusion model and generates high-resolution images through multiple stages.The main features of CogView3 include multi-stage generation, innovative architecture and efficient performance, which are applicable to many fields such as art creation, advertisement design, game development, and so on.

The capabilities of this series of models are now online at "Smart Spectrum Clear Words" (chatglm.cn) and can be experienced on Clear Words.

CogView3: An Efficient Text Generation Image System Based on Cascading Diffusion of text2img Model-1

 


CogView3: Cascading Diffusion Text Generation Image Model for Smart Spectrum Light Talk Open Source-1

Top: A pink colored car. Bottom: A stack of 3 cubes. A red cube is on the top, sitting on a red cube. The red cube is in the middle, sitting on a green cube. The green cube is on the bottom.

 

Function List

  • Multi-stage generation: First generate low-resolution images, then gradually increase the image resolution through a relay diffusion process, and finally generate high-resolution images up to 2048x2048.
  • Efficient Performance: CogView3 significantly reduces training and inference costs while generating high quality images. Compared to SDXL, the current state-of-the-art open source model, CogView3's inference time is only 1/10th of that of SDXL.
  • Innovative Architecture: CogView3 introduces the latest DiT (Diffusion Transformer) architecture, which utilizes Zero-SNR diffusion noise scheduling and combines the text-image joint attention mechanism to further improve the overall performance.
  • Open source code: The code and model of CogView3 have been open sourced on GitHub and can be freely downloaded and used by users.

 

Using Help

Installation and Registration

  1. Visit the website: Open the official CogView3 website. GitHubThe
  2. Download Code : Click the "Code" button on the page and select "Download ZIP" to download the project file, or use the git command to download it:git<span> </span>clone<span> </span>https://github.com/THUDM/CogView3.gitThe
  3. Install dependencies: Ensure that the diffusers library is installed from source:
pip install git+https://github.com/huggingface/diffusers.git

Usage Process

  1. Cue Optimization :
    • Although the CogView3 family of models is trained with long image descriptions, we strongly recommend rewriting the cues using Large Language Models (LLMs) before generating text to images, which will significantly improve the quality of the generation.
    • Run the following script to optimize the prompt:
      python prompt_optimize.py --api_key "Zhipu AI API Key"--prompt {your prompt} --base_url "https://open.bigmodel.cn/api/paas/v4"--model "glm-4-plus"
      
  2. Reasoning Models (Diffusers) :
    • First, make sure you install the diffusers library from source:
      pip install git+https://github.com/huggingface/diffusers.git
      
    • Then, run the following code:
      fromdiffusers importCogView3PlusPipeline
      importtorch
      
      pipe = CogView3PlusPipeline.from_pretrained("THUDM/CogView3-Plus-3B", torch_dtype=torch.float16).to("cuda")
      pipe.enable_model_cpu_offload()
      pipe.vae.enable_slicing()
      pipe.vae.enable_tiling()
      
      prompt = "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast. car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background. crashing in the background."
      
      image = pipe(
          prompt=prompt,
          guidance_scale=7.0,
          num_images_per_prompt=1,
          num_inference_steps=50, the
          width=1024,
          height=1024,
      ).images[0]
      
      image.save("cogview3.png")
      
  3. Reasoning Model (SAT) :
    • Refer to the SAT tutorial for step-by-step model reasoning instructions.

common problems

  • Installation Failure: Make sure the Python version meets the requirements, and pay attention to version compatibility when installing PyTorch.
  • Image quality : The specificity of the text description and the richness of the training dataset will affect the results of the generated images, it is recommended to use detailed text description and diverse datasets for training.
AI Easy Learning

The layman's guide to getting started with AI

Help you learn how to utilize AI tools at a low cost and from a zero base.AI, like office software, is an essential skill for everyone. Mastering AI will give you an edge in your job search and half the effort in your future work and studies.

View Details>
May not be reproduced without permission:Chief AI Sharing Circle " CogView3: Wisdom Spectrum Light Word open source cascade diffusion text to generate image models

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish