RF-DETR: An Open Source Model for Real-Time Visual Object Detection

Latest AI Resources5mos agorelease AI Sharing Circle

1.7K 00

General Introduction

RF-DETR is an open source object detection model developed by the Roboflow team. It is based on Transformer architecture, the core feature is real-time efficiency. The model achieves real-time detection over 60 APs for the first time on the Microsoft COCO dataset, and also performs outstandingly in the RF100-VL benchmark test, adapting to a variety of real-world scenarios. It is available in two versions: RF-DETR-base (29 million parameters) and RF-DETR-large (128 million parameters). The model is small and suitable for edge device deployment. The code and pre-trained weights are licensed under the Apache 2.0 license and are free and open for community use. Users can obtain resources from GitHub for easy training or deployment.

Function List

Real-time object detection: fast recognition of objects in images or videos with low latency.
Custom dataset training: support for tuning models with your own data.
Running on edge devices: the model is lightweight and suitable for resource-limited devices.
Adjustable resolution: users can balance inspection speed and accuracy.
Pre-training model support: provides pre-trained weights based on the COCO dataset.
Video stream processing: can analyze the video in real time and output the results.
ONNX Export: Supports conversion to ONNX format for easy cross-platform deployment.
Multi-GPU training: You can accelerate the training process with multiple graphics cards.

Using Help

The use of RF-DETR is divided into three parts: installation, inference and training. Below are detailed steps to help you get started quickly.

Installation process

environmental preparation
Requires Python 3.9 or higher, and PyTorch 1.13.0 or higher. If using a GPU, run nvidia-smi Check the drive.
- Install PyTorch:
```
pip install torch>=1.13.0 torchvision>=0.14.0
```
- Download code:
```
git clone https://github.com/roboflow/rf-detr.git
cd rf-detr
```
- Install the dependencies:
```
pip install rfdetr
```
  This will automatically install numpy,supervision and other necessary libraries.

Verify Installation
Run the following code:

from rfdetr import RFDETRBase
print("安装成功")

If no errors are reported, the installation is complete.

inference operation

RF-DETR comes with a pre-trained model of the COCO dataset to detect images or videos directly.

image detection

Sample code:

import io
import requests
from PIL import Image
from rfdetr import RFDETRBase
import supervision as sv
model = RFDETRBase()
url = "https://media.roboflow.com/notebooks/examples/dog-2.jpeg"
image = Image.open(io.BytesIO(requests.get(url).content))
detections = model.predict(image, threshold=0.5)
labels = [f"{class_id} {confidence:.2f}" for class_id, confidence in zip(detections.class_id, detections.confidence)]
annotated_image = image.copy()
annotated_image = sv.BoxAnnotator().annotate(annotated_image, detections)
annotated_image = sv.LabelAnnotator().annotate(annotated_image, detections, labels)
sv.plot_image(annotated_image)

This code detects objects in the image, labels the bounding box and confidence level, and then displays the results.

Video Detection

first install opencv-python::
```
pip install opencv-python
```

Sample code:

import cv2
from rfdetr import RFDETRBase
import supervision as sv
model = RFDETRBase()
cap = cv2.VideoCapture("video.mp4")  # 替换为你的视频路径
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
image = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
detections = model.predict(image, threshold=0.5)
annotated_frame = sv.BoxAnnotator().annotate(frame, detections)
cv2.imshow("RF-DETR Detection", annotated_frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()

This will detect objects in the video frame by frame and display them in real time.

Adjustment of resolution
- The resolution can be set at initialization (must be a multiple of 56):
```
model = RFDETRBase(resolution=560)
```
- The higher the resolution, the better the accuracy, but it will be slower.

Training customized models

RF-DETR supports fine-tuning with its own dataset, but the dataset needs to be in COCO format, containing train,valid cap (a poem) test Three subdirectories.

Preparing the dataset

Example catalog structure:

dataset/
├── train/
│   ├── _annotations.coco.json
│   ├── image1.jpg
│   └── image2.jpg
├── valid/
│   ├── _annotations.coco.json
│   ├── image1.jpg
│   └── image2.jpg
└── test/
├── _annotations.coco.json
├── image1.jpg
└── image2.jpg

COCO format datasets can be generated using the Roboflow platform:

from roboflow import Roboflow
rf = Roboflow(api_key="你的API密钥")
project = rf.workspace("rf-100-vl").project("mahjong-vtacs-mexax-m4vyu-sjtd")
dataset = project.version(2).download("coco")

Start training
- Sample code:
```
from rfdetr import RFDETRBase
model = RFDETRBase()
model.train(dataset_dir="./mahjong-vtacs-mexax-m4vyu-sjtd-2", epochs=10, batch_size=4, grad_accum_steps=4, lr=1e-4)
```
- For training, the recommended total batch size (batch_size * grad_accum_stepsFor example, the A100 GPUs use the batch_size=16, grad_accum_steps=1T4 GPUs batch_size=4, grad_accum_steps=4The

Multi-GPU Training

establish main.py Documentation:

from rfdetr import RFDETRBase
model = RFDETRBase()
model.train(dataset_dir="./dataset", epochs=10, batch_size=4, grad_accum_steps=4, lr=1e-4)

Runs in the terminal:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py

commander-in-chief (military) 8 Replace with the number of GPUs you are using. Note the adjustment batch_size to keep the total batch size stable.

Load training results
- Two weight files are generated after training: regular weights and EMA weights (more stable). Loading method:
```
model = RFDETRBase(pretrain_weights="./output/model_ema.pt")
detections = model.predict("image.jpg")
```

ONNX Export

Export to ONNX format for easy deployment on other platforms:

from rfdetr import RFDETRBase
model = RFDETRBase()
model.export()

The exported file is saved in the output Directory for optimized reasoning for edge devices.

application scenario

automatic driving
RF-DETR detects vehicles and pedestrians on the road in real time. Its low latency and high accuracy are suitable for embedded systems.
industrial quality control
RF-DETR quickly identifies part defects on factory assembly lines. The model is lightweight and can be run directly on the equipment.
video surveillance
RF-DETR processes surveillance video to detect abnormal objects or behavior in real time. It supports video streaming and is suitable for 24/7 security.

QA

What dataset formats are supported?
Only the COCO format is supported. The dataset needs to contain train,valid cap (a poem) test subdirectories, each with a corresponding _annotations.coco.json Documentation.
How to get Roboflow API key?
Log in to https://app.roboflow.com, find the API key in your account settings, copy it and set it to the environment variable ROBOFLOW_API_KEYThe
How long does the training take?
Depends on hardware and dataset size. On a T4 GPU, 10 epochs might take a couple hours. Smaller datasets can be run on a CPU, but it's slow.