AI Personal Learning
and practical guidance
Resource Recommendation 1

YOLOv12: Open source tool for real-time image and video target detection

General Introduction

YOLOv12 is an open source project developed by GitHub user sunsmarterjie , focusing on real-time target detection technology . The project is based on YOLO (You Only Look Once) series of frameworks , the introduction of the attention mechanism to optimize the performance of traditional convolutional neural networks (CNN) , not only in the detection accuracy has improved , but also maintains the efficient inference speed . YOLOv12 applicable to a variety of scenarios , such as surveillance systems , autonomous driving and image analysis , etc., providing Nano, Small, Medium, Large, and Extra-Large model sizes to meet different computing power and application requirements. The project is licensed under the GNU AGPL-3.0 license, which allows users to download the code for free and customize it according to their needs. The developer team includes researchers from the University at Buffalo and the Chinese Academy of Sciences, and the technical documentation and installation guide are detailed, so that users can get started quickly.

YOLOv12: Open source tool providing real-time image and video target detection-1


 

Function List

  • Efficient real-time target detection: On a T4 GPU, YOLOv12-N achieves 40.6% mAP with an inference latency of only 1.64ms.
  • Multi-model selection: Five models (Nano to Extra-Large) are available to accommodate a wide range of hardware environments, from low-power devices to high-performance servers.
  • Attention mechanism optimization: Introduction of Area Attention and R-ELAN modules to improve detection accuracy and reduce computational complexity.
  • Model Export: Support for exporting training models to ONNX or TensorRT format for easy deployment to production environments.
  • Customized dataset training: Users can train models with their own datasets, suitable for specific target detection tasks.
  • Visualization Support:: Integration of supervision tools for easy presentation of test results and performance evaluation.

 

Using Help

Installation process

YOLOv12 currently does not have a standalone PyPI package and needs to be installed from the GitHub source. Here are the detailed installation steps for Linux systems (Windows or Mac users will need to adjust their environment configuration):

  1. Preparing the environment
    • Make sure Python 3.11 or later is installed on your system.
    • Install Git:sudo apt install git(Ubuntu example).
    • Optional: Install the NVIDIA GPU driver and CUDA (recommended version 11.8 or higher) to accelerate training and inference.
  2. clone warehouse
    Download the YOLOv12 repository locally by running the following command in a terminal:

    git clone https://github.com/sunsmarterjie/yolov12.git
    cd yolov12
  1. Creating a Virtual Environment
    Use Conda or venv to create separate Python environments and avoid dependency conflicts:

    conda create -n yolov12 python=3.11
    conda activate yolov12
    
  2. Installation of dependencies
    Install the required dependencies for your project, including PyTorch, flash-attn, supervision, etc:

    wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311- linux_x86_64.whl
    pip install flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
    pip install -r requirements.txt
    pip install -e .
    
  3. Verify Installation
    Run the following command to check that the environment is properly configured:

    python -c "from ultralytics import YOLO; print('YOLOv12 installed successfully')"
    

Usage

Training customized models

YOLOv12 supports users to train with their own datasets, suitable for scene-specific target detection tasks. The operation steps are as follows:

  1. Preparing the dataset
    • The data should be in YOLO format (containing images and labels folders, labels as .txt files, labeled with target categories and bounding box coordinates).
    • establish data.yaml file, specifying the training set, validation set path, and category name. Example:
      train: . /dataset/train/images
      val: . /dataset/val/images
      nc: 2 # Number of categories
      names: ['cat', 'dog'] # category names
      
  2. Load the model and train it
    Use a Python script to load a pre-trained model and start training:

    from ultralytics import YOLO
    model = YOLO('yolov12s.pt') # optional n/s/m/l/x model
    results = model.train(data='path/to/data.yaml', epochs=250, imgsz=640)
    
    • epochs: Number of training rounds, 250 or more is recommended for better results.
    • imgsz: Enter the image size, default 640x640.
  3. View training results
    After the training is completed, the results are saved in the runs/detect/train folder, including model weights (best.pt) and obfuscation matrices, etc. Run the following code to see the confusion matrix:

    from IPython.display import Image
    Image(filename='runs/detect/train/confusion_matrix.png', width=600)
    

Reasoning and Testing

The trained model can be used for target detection in images or videos:

  1. Single Image Detection
    model = YOLO('path/to/best.pt')
    results = model('test.jpg')
    results.show() # show test results
    results.save() # Save results to runs/detect/predict
    
  2. Video Detection
    Use the command line to process video files:

    python app.py --source 'video.mp4' --model 'path/to/best.pt'
    
  3. Performance Evaluation
    The validation set is evaluated to obtain metrics such as mAP:

    results = model.val(data='path/to/data.yaml')
    print(results.box.map) # output mAP@0.5:0.95
    

Model Export

Export the model to a format usable by the production environment:

model.export(format='onnx', half=True) # exported as ONNX, supports FP16 acceleration

The exported models can be deployed to edge devices or servers.

Featured Function Operation

  • Attention mechanism optimization
    YOLOv12's "Area Attention" module does not need to be manually configured, but automatically optimizes feature extraction during training and inference to improve small target detection. Users simply choose the right model size (e.g., Nano for low-power devices) and enjoy the improved accuracy of this feature.
  • real time detection
    Inference is extremely fast when running on CUDA-enabled GPUs. For example, running the YOLOv12-N model on a T4 GPU results in single-image detection in just 1.64 ms, and users can visualize detection frames and confidence levels in real time with the supervision tool:

    results = model('image.jpg')
    results.plot() # Display labeled images
    
  • Multi-scene Adaptation
    By adjusting the model size and training data, YOLOv12 can be easily adapted to different tasks. For example, detecting pedestrians in a surveillance system or recognizing vehicles and traffic signs in autonomous driving.
Content 1
May not be reproduced without permission:Chief AI Sharing Circle " YOLOv12: Open source tool for real-time image and video target detection

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish