AI Personal Learning
and practical guidance
Beanbag Marscode1

PhotoDoodle: AI tool for adding artistic doodles to photos with text commands

General Introduction

PhotoDoodle is an open source image editing tool, developed by ShowLab, focusing on artistic editing of photos through artificial intelligence technology. Users can add cartoon style, 3D effect, halo, wings and other decorative elements to real photos by simply inputting simple text prompts, generating a combination of virtual and real artworks. It is based on a powerful deep learning model, supports less sample learning, and can quickly adapt to the user's personalized style, making it suitable for artists, designers, or ordinary users to create creative works. The project is hosted on GitHub, which provides code, datasets and pre-trained models for developers to reproduce or develop twice. Its unique "photo graffiti" function fills the gap in traditional editing software, preserving the integrity of the background of the photo and seamlessly integrating artistic elements, which has attracted a lot of attention.

PhotoDoodle: AI tool for adding artistic doodles to photos with text commands-1


 

Function List

  • Text-driven art editing: Automatically generate graffiti elements with text descriptions (e.g. "add cartoon monster" or "add halo effect").
  • Sample less learning support: Learning and generating personalized editing styles with only a small amount of user-supplied pairing data.
  • High-quality fusion of reality and fiction: Ensure that the newly added elements blend naturally with the background of the photo in terms of perspective, light and shadow.
  • Dataset and model openness: Provides pre-trained models and diverse style datasets to support direct user downloads.
  • Open Source Support: Allows developers to modify the code or integrate it into other projects with high flexibility.
  • Batch processing capability: Supports editing multiple images at once for efficiency.

 

Using Help

PhotoDoodle is an open source project based on GitHub, and users need a certain technical foundation to install and use it. The following is a detailed installation and use guide to help you get started quickly.

Installation process

  1. environmental preparation
    • Make sure you have Git, Python 3.11.10 and Conda installed on your computer.
    • Open a terminal and enter the following command to clone the project locally:
      git clone git@github.com:showlab/PhotoDoodle.git
      cd PhotoDoodle
      
    • Create and activate a virtual environment:
      conda create -n doodle python=3.11.10
      conda activate doodle
      
  2. Installation of dependencies
    • Install PyTorch (CUDA-accelerated version recommended, if you have a GPU):
      pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
      
    • Install other dependencies:
      pip install --upgrade -r requirements.txt
      
    • Wait for the installation to complete and make sure the network is clear.
  3. Download pre-trained model
    • The project provides several pre-trained models that need to be downloaded manually. Visit PhotoDoodle's GitHub Releases or Hugging Face dataset pages to download the model files (e.g. OmniEditor cap (a poem) EditLoRA).
    • Place the downloaded model files into the specified folder in the project directory (refer to the README for the path description, which is usually checkpoints/).
  4. Verify Installation
    • Run a test command (such as the sample script provided in the README) in a terminal and check for errors. If there are no errors, the installation was successful.

Usage

PhotoDoodle's core functionality is editing photos via text commands, operating in two scenarios: direct use of pre-trained models and custom training.

Editing photos using pre-trained models

  1. Prepare the picture
    • Place the photo to be edited (e.g. source.jpg) into the project directory under the input/ folder (if this folder does not exist, create it yourself).
  2. Run the edit command
    • Enter the following command in the terminal (assuming it is activated) doodle (Environment):
      python inference.py --source input/source.jpg --prompt "在照片上添加卡通风格的翅膀" --output output/result.jpg
      
    • Parameter Description:
      • --source: Source photo path.
      • --prompt: A text directive that describes the element you want to add.
      • --output: Outputs the result path.
    • After running, the generated results are saved in the output/result.jpgThe
  3. View Results
    • show (a ticket) output/ folder to check the generated images. Adjustment commands (e.g., "Add Light and Shadow Effect") can generate different styles.

Custom Training Personalization Style

  1. Preparation of paired datasets
    • Create a .jsonl Files (e.g. dataset.jsonl), recording a pair of images and descriptions per line:
      {"source": "path/to/source.jpg", "target": "path/to/modified.jpg", "caption": "添加蓝色光环"}
      {"source": "path/to/source2.jpg", "target": "path/to/modified2.jpg", "caption": "增加卡通怪物"}
      
    • Prepare at least 5-10 pairs of images that reflect your style needs.
  2. Run the training script
    • commander-in-chief (military) .jsonl file into the project directory and execute it:
      python train.py --data dataset.jsonl --model OmniEditor --output_dir trained_model/
      
    • The training time depends on the amount of data and hardware performance (GPU recommended), and after completion the model is saved in the trained_model/The
  3. Editing with Custom Models
    • Inference using trained models:
      python inference.py --source input/source.jpg --prompt "添加我的风格元素" --model trained_model/checkpoint.pth --output output/custom_result.jpg
      
    • Check the output to verify that it meets expectations.

Operation process details

  • Batch Edit: Place multiple images into the input/ folder, the modification script supports loop processing (e.g., adding the --batch parameter, refer to the code comments for implementation details).
  • Adjustment effect: If the blending is not natural, add a detailed description to the cue (e.g., "consistent with the background lighting") or adjust the model parameters (see config/ (Documentation).
  • Debugging Issues: If something goes wrong, check the Python version, the dependencies for completeness, or check GitHub Issues for community help.

caveat

  • Hardware Requirements: GPU (e.g. NVIDIA CUDA support) is recommended for speed, CPU can run but is slower.
  • Data quality: the higher the resolution of the input image, the better the result; the customized dataset needs to be consistent.
  • Online experience: Some features can be tested online via Hugging Face Spaces without local installation.

With these steps, you can easily add artistic appeal to your photos with PhotoDoodle, whether it's a quick trial or deep customization.

May not be reproduced without permission:Chief AI Sharing Circle " PhotoDoodle: AI tool for adding artistic doodles to photos with text commands
en_USEnglish