General Introduction
YOLOv12 is an open source project developed by GitHub user sunsmarterjie , focusing on real-time target detection technology . The project is based on YOLO (You Only Look Once) series of frameworks , the introduction of the attention mechanism to optimize the performance of traditional convolutional neural networks (CNN) , not only in the detection accuracy has improved , but also maintains the efficient inference speed . YOLOv12 applicable to a variety of scenarios , such as surveillance systems , autonomous driving and image analysis , etc., providing Nano, Small, Medium, Large, and Extra-Large model sizes to meet different computing power and application requirements. The project is licensed under the GNU AGPL-3.0 license, which allows users to download the code for free and customize it according to their needs. The developer team includes researchers from the University at Buffalo and the Chinese Academy of Sciences, and the technical documentation and installation guide are detailed, so that users can get started quickly.
Function List
- Efficient real-time target detection: On a T4 GPU, YOLOv12-N achieves 40.6% mAP with an inference latency of only 1.64ms.
- Multi-model selection: Five models (Nano to Extra-Large) are available to accommodate a wide range of hardware environments, from low-power devices to high-performance servers.
- Attention mechanism optimization: Introduction of Area Attention and R-ELAN modules to improve detection accuracy and reduce computational complexity.
- Model Export: Support for exporting training models to ONNX or TensorRT format for easy deployment to production environments.
- Customized dataset training: Users can train models with their own datasets, suitable for specific target detection tasks.
- Visualization Support:: Integration of supervision tools for easy presentation of test results and performance evaluation.
Using Help
Installation process
YOLOv12 currently does not have a standalone PyPI package and needs to be installed from the GitHub source. Here are the detailed installation steps for Linux systems (Windows or Mac users will need to adjust their environment configuration):
- Preparing the environment
- Make sure Python 3.11 or later is installed on your system.
- Install Git:
sudo apt install git
(Ubuntu example). - Optional: Install the NVIDIA GPU driver and CUDA (recommended version 11.8 or higher) to accelerate training and inference.
- clone warehouse
Download the YOLOv12 repository locally by running the following command in a terminal:git clone https://github.com/sunsmarterjie/yolov12.git cd yolov12
- Creating a Virtual Environment
Use Conda or venv to create separate Python environments and avoid dependency conflicts:conda create -n yolov12 python=3.11 conda activate yolov12
- Installation of dependencies
Install the required dependencies for your project, including PyTorch, flash-attn, supervision, etc:wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311- linux_x86_64.whl pip install flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl pip install -r requirements.txt pip install -e .
- Verify Installation
Run the following command to check that the environment is properly configured:python -c "from ultralytics import YOLO; print('YOLOv12 installed successfully')"
Usage
Training customized models
YOLOv12 supports users to train with their own datasets, suitable for scene-specific target detection tasks. The operation steps are as follows:
- Preparing the dataset
- The data should be in YOLO format (containing images and labels folders, labels as .txt files, labeled with target categories and bounding box coordinates).
- establish
data.yaml
file, specifying the training set, validation set path, and category name. Example:train: . /dataset/train/images val: . /dataset/val/images nc: 2 # Number of categories names: ['cat', 'dog'] # category names
- Load the model and train it
Use a Python script to load a pre-trained model and start training:from ultralytics import YOLO model = YOLO('yolov12s.pt') # optional n/s/m/l/x model results = model.train(data='path/to/data.yaml', epochs=250, imgsz=640)
epochs
: Number of training rounds, 250 or more is recommended for better results.imgsz
: Enter the image size, default 640x640.
- View training results
After the training is completed, the results are saved in theruns/detect/train
folder, including model weights (best.pt
) and obfuscation matrices, etc. Run the following code to see the confusion matrix:from IPython.display import Image Image(filename='runs/detect/train/confusion_matrix.png', width=600)
Reasoning and Testing
The trained model can be used for target detection in images or videos:
- Single Image Detection
model = YOLO('path/to/best.pt') results = model('test.jpg') results.show() # show test results results.save() # Save results to runs/detect/predict
- Video Detection
Use the command line to process video files:python app.py --source 'video.mp4' --model 'path/to/best.pt'
- Performance Evaluation
The validation set is evaluated to obtain metrics such as mAP:results = model.val(data='path/to/data.yaml') print(results.box.map) # output mAP@0.5:0.95
Model Export
Export the model to a format usable by the production environment:
model.export(format='onnx', half=True) # exported as ONNX, supports FP16 acceleration
The exported models can be deployed to edge devices or servers.
Featured Function Operation
- Attention mechanism optimization
YOLOv12's "Area Attention" module does not need to be manually configured, but automatically optimizes feature extraction during training and inference to improve small target detection. Users simply choose the right model size (e.g., Nano for low-power devices) and enjoy the improved accuracy of this feature. - real time detection
Inference is extremely fast when running on CUDA-enabled GPUs. For example, running the YOLOv12-N model on a T4 GPU results in single-image detection in just 1.64 ms, and users can visualize detection frames and confidence levels in real time with the supervision tool:results = model('image.jpg') results.plot() # Display labeled images
- Multi-scene Adaptation
By adjusting the model size and training data, YOLOv12 can be easily adapted to different tasks. For example, detecting pedestrians in a surveillance system or recognizing vehicles and traffic signs in autonomous driving.