YOLOv12: Open source tool for real-time image and video target detection

Latest AI Resources6mos agoupdate AI Sharing Circle

1.7K 00

General Introduction

YOLOv12 is an open source project developed by GitHub user sunsmarterjie , focusing on real-time target detection technology . The project is based on YOLO (You Only Look Once) series of frameworks , the introduction of the attention mechanism to optimize the performance of traditional convolutional neural networks (CNN) , not only in the detection accuracy has improved , but also maintains the efficient inference speed . YOLOv12 applicable to a variety of scenarios , such as surveillance systems , autonomous driving and image analysis , etc., providing Nano, Small, Medium, Large, and Extra-Large model sizes to meet different computing power and application requirements. The project is licensed under the GNU AGPL-3.0 license, which allows users to download the code for free and customize it according to their needs. The developer team includes researchers from the University at Buffalo and the Chinese Academy of Sciences, and the technical documentation and installation guide are detailed, so that users can get started quickly.

Function List

Efficient real-time target detection: On a T4 GPU, YOLOv12-N achieves 40.6% mAP with an inference latency of only 1.64ms.
Multi-model selection: Five models (Nano to Extra-Large) are available to accommodate a wide range of hardware environments, from low-power devices to high-performance servers.
Attention mechanism optimization: Introduction of Area Attention and R-ELAN modules to improve detection accuracy and reduce computational complexity.
Model Export: Support for exporting training models to ONNX or TensorRT format for easy deployment to production environments.
Customized dataset training: Users can train models with their own datasets, suitable for specific target detection tasks.
Visualization Support:: Integration of supervision tools for easy presentation of test results and performance evaluation.

Using Help

Installation process

YOLOv12 currently does not have a standalone PyPI package and needs to be installed from the GitHub source. Here are the detailed installation steps for Linux systems (Windows or Mac users will need to adjust their environment configuration):

Preparing the environment
- Make sure Python 3.11 or later is installed on your system.
- Install Git:sudo apt install git(Ubuntu example).
- Optional: Install the NVIDIA GPU driver and CUDA (recommended version 11.8 or higher) to accelerate training and inference.
clone warehouse
Download the YOLOv12 repository locally by running the following command in a terminal:
```
git clone https://github.com/sunsmarterjie/yolov12.git
cd yolov12
```

Creating a Virtual Environment
Use Conda or venv to create separate Python environments and avoid dependency conflicts:
```
conda create -n yolov12 python=3.11
conda activate yolov12
```

Installation of dependencies
Install the required dependencies for your project, including PyTorch, flash-attn, supervision, etc:

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
pip install flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
pip install -r requirements.txt
pip install -e .

Verify Installation
Run the following command to check that the environment is properly configured:
```
python -c "from ultralytics import YOLO; print('YOLOv12 installed successfully')"
```

Usage

Training customized models

YOLOv12 supports users to train with their own datasets, suitable for scene-specific target detection tasks. The operation steps are as follows:

Preparing the dataset
- The data should be in YOLO format (containing images and labels folders, labels as .txt files, labeled with target categories and bounding box coordinates).
- establish data.yaml file, specifying the training set, validation set path, and category name. Example:
```
train: ./dataset/train/images
val: ./dataset/val/images
nc: 2  # 类别数量
names: ['cat', 'dog']  # 类别名称
```
Load the model and train it
Use a Python script to load a pre-trained model and start training:
```
from ultralytics import YOLO
model = YOLO('yolov12s.pt')  # 可选 n/s/m/l/x 模型
results = model.train(data='path/to/data.yaml', epochs=250, imgsz=640)
```
- epochs: Number of training rounds, 250 or more is recommended for better results.
- imgsz: Enter the image size, default 640x640.
View training results
After the training is completed, the results are saved in the runs/detect/train folder, including model weights (best.pt) and obfuscation matrices, etc. Run the following code to see the confusion matrix:
```
from IPython.display import Image
Image(filename='runs/detect/train/confusion_matrix.png', width=600)
```

Reasoning and Testing

The trained model can be used for target detection in images or videos:

Single Image Detection

model = YOLO('path/to/best.pt')
results = model('test.jpg')
results.show()  # 显示检测结果
results.save()  # 保存结果到 runs/detect/predict

Video Detection
Use the command line to process video files:

python app.py --source 'video.mp4' --model 'path/to/best.pt'

Performance Evaluation
The validation set is evaluated to obtain metrics such as mAP:

results = model.val(data='path/to/data.yaml')
print(results.box.map)  # 输出 mAP@0.5:0.95

Model Export

Export the model to a format usable by the production environment:

model.export(format='onnx', half=True)  # 导出为 ONNX，支持 FP16 加速

The exported models can be deployed to edge devices or servers.

Featured Function Operation

Attention mechanism optimization
YOLOv12's "Area Attention" module does not need to be manually configured, but automatically optimizes feature extraction during training and inference to improve small target detection. Users simply choose the right model size (e.g., Nano for low-power devices) and enjoy the improved accuracy of this feature.
real time detection
Inference is extremely fast when running on CUDA-enabled GPUs. For example, running the YOLOv12-N model on a T4 GPU results in single-image detection in just 1.64 ms, and users can visualize detection frames and confidence levels in real time with the supervision tool:
```
results = model('image.jpg')
results.plot()  # 显示带标注的图像
```
Multi-scene Adaptation
By adjusting the model size and training data, YOLOv12 can be easily adapted to different tasks. For example, detecting pedestrians in a surveillance system or recognizing vehicles and traffic signs in autonomous driving.