General Introduction
RF-DETR is an open source object detection model developed by the Roboflow team. It is based on Transformer architecture, the core feature is real-time efficiency. The model achieves real-time detection over 60 APs for the first time on the Microsoft COCO dataset, and also performs outstandingly in the RF100-VL benchmark test, adapting to a variety of real-world scenarios. It is available in two versions: RF-DETR-base (29 million parameters) and RF-DETR-large (128 million parameters). The model is small and suitable for edge device deployment. The code and pre-trained weights are licensed under the Apache 2.0 license and are free and open for community use. Users can obtain resources from GitHub for easy training or deployment.
Function List
- Real-time object detection: fast recognition of objects in images or videos with low latency.
- Custom dataset training: support for tuning models with your own data.
- Running on edge devices: the model is lightweight and suitable for resource-limited devices.
- Adjustable resolution: users can balance inspection speed and accuracy.
- Pre-training model support: provides pre-trained weights based on the COCO dataset.
- Video stream processing: can analyze the video in real time and output the results.
- ONNX Export: Supports conversion to ONNX format for easy cross-platform deployment.
- Multi-GPU training: You can accelerate the training process with multiple graphics cards.
Using Help
The use of RF-DETR is divided into three parts: installation, inference and training. Below are detailed steps to help you get started quickly.
Installation process
- environmental preparation
Requires Python 3.9 or higher, and PyTorch 1.13.0 or higher. If using a GPU, runnvidia-smi
Check the drive.- Install PyTorch:
pip install torch>=1.13.0 torchvision>=0.14.0
- Download code:
git clone https://github.com/roboflow/rf-detr.git cd rf-detr
- Install the dependencies:
pip install rfdetr
This will automatically install
numpy
,supervision
and other necessary libraries.
- Install PyTorch:
- Verify Installation
Run the following code:from rfdetr import RFDETRBase print("安装成功")
If no errors are reported, the installation is complete.
inference operation
RF-DETR comes with a pre-trained model of the COCO dataset to detect images or videos directly.
- image detection
- Sample code:
import io import requests from PIL import Image from rfdetr import RFDETRBase import supervision as sv model = RFDETRBase() url = "https://media.roboflow.com/notebooks/examples/dog-2.jpeg" image = Image.open(io.BytesIO(requests.get(url).content)) detections = model.predict(image, threshold=0.5) labels = [f"{class_id} {confidence:.2f}" for class_id, confidence in zip(detections.class_id, detections.confidence)] annotated_image = image.copy() annotated_image = sv.BoxAnnotator().annotate(annotated_image, detections) annotated_image = sv.LabelAnnotator().annotate(annotated_image, detections, labels) sv.plot_image(annotated_image)
- This code detects objects in the image, labels the bounding box and confidence level, and then displays the results.
- Sample code:
- Video Detection
- first install
opencv-python
::pip install opencv-python
- Sample code:
import cv2 from rfdetr import RFDETRBase import supervision as sv model = RFDETRBase() cap = cv2.VideoCapture("video.mp4") # 替换为你的视频路径 while cap.isOpened(): ret, frame = cap.read() if not ret: break image = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)) detections = model.predict(image, threshold=0.5) annotated_frame = sv.BoxAnnotator().annotate(frame, detections) cv2.imshow("RF-DETR Detection", annotated_frame) if cv2.waitKey(1) & 0xFF == ord('q'): break cap.release() cv2.destroyAllWindows()
- This will detect objects in the video frame by frame and display them in real time.
- first install
- Adjustment of resolution
- The resolution can be set at initialization (must be a multiple of 56):
model = RFDETRBase(resolution=560)
- The higher the resolution, the better the accuracy, but it will be slower.
- The resolution can be set at initialization (must be a multiple of 56):
Training customized models
RF-DETR supports fine-tuning with its own dataset, but the dataset needs to be in COCO format, containing train
,valid
cap (a poem) test
Three subdirectories.
- Preparing the dataset
- Example catalog structure:
dataset/ ├── train/ │ ├── _annotations.coco.json │ ├── image1.jpg │ └── image2.jpg ├── valid/ │ ├── _annotations.coco.json │ ├── image1.jpg │ └── image2.jpg └── test/ ├── _annotations.coco.json ├── image1.jpg └── image2.jpg
- COCO format datasets can be generated using the Roboflow platform:
from roboflow import Roboflow rf = Roboflow(api_key="你的API密钥") project = rf.workspace("rf-100-vl").project("mahjong-vtacs-mexax-m4vyu-sjtd") dataset = project.version(2).download("coco")
- Example catalog structure:
- Start training
- Sample code:
from rfdetr import RFDETRBase model = RFDETRBase() model.train(dataset_dir="./mahjong-vtacs-mexax-m4vyu-sjtd-2", epochs=10, batch_size=4, grad_accum_steps=4, lr=1e-4)
- For training, the recommended total batch size (
batch_size * grad_accum_steps
For example, the A100 GPUs use thebatch_size=16, grad_accum_steps=1
T4 GPUsbatch_size=4, grad_accum_steps=4
The
- Sample code:
- Multi-GPU Training
- establish
main.py
Documentation:from rfdetr import RFDETRBase model = RFDETRBase() model.train(dataset_dir="./dataset", epochs=10, batch_size=4, grad_accum_steps=4, lr=1e-4)
- Runs in the terminal:
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py
- commander-in-chief (military)
8
Replace with the number of GPUs you are using. Note the adjustmentbatch_size
to keep the total batch size stable.
- establish
- Load training results
- Two weight files are generated after training: regular weights and EMA weights (more stable). Loading method:
model = RFDETRBase(pretrain_weights="./output/model_ema.pt") detections = model.predict("image.jpg")
- Two weight files are generated after training: regular weights and EMA weights (more stable). Loading method:
ONNX Export
- Export to ONNX format for easy deployment on other platforms:
from rfdetr import RFDETRBase model = RFDETRBase() model.export()
- The exported file is saved in the
output
Directory for optimized reasoning for edge devices.
application scenario
- automatic driving
RF-DETR detects vehicles and pedestrians on the road in real time. Its low latency and high accuracy are suitable for embedded systems. - industrial quality control
RF-DETR quickly identifies part defects on factory assembly lines. The model is lightweight and can be run directly on the equipment. - video surveillance
RF-DETR processes surveillance video to detect abnormal objects or behavior in real time. It supports video streaming and is suitable for 24/7 security.
QA
- What dataset formats are supported?
Only the COCO format is supported. The dataset needs to containtrain
,valid
cap (a poem)test
subdirectories, each with a corresponding_annotations.coco.json
Documentation. - How to get Roboflow API key?
Log in to https://app.roboflow.com, find the API key in your account settings, copy it and set it to the environment variableROBOFLOW_API_KEY
The - How long does the training take?
Depends on hardware and dataset size. On a T4 GPU, 10 epochs might take a couple hours. Smaller datasets can be run on a CPU, but it's slow.