AI Personal Learning
and practical guidance
讯飞绘镜

RF-DETR: An Open Source Model for Real-Time Visual Object Detection

General Introduction

RF-DETR is an open source object detection model developed by the Roboflow team. It is based on Transformer architecture, the core feature is real-time efficiency. The model achieves real-time detection over 60 APs for the first time on the Microsoft COCO dataset, and also performs outstandingly in the RF100-VL benchmark test, adapting to a variety of real-world scenarios. It is available in two versions: RF-DETR-base (29 million parameters) and RF-DETR-large (128 million parameters). The model is small and suitable for edge device deployment. The code and pre-trained weights are licensed under the Apache 2.0 license and are free and open for community use. Users can obtain resources from GitHub for easy training or deployment.

RF-DETR:实时视觉对象检测开源模型-1


 

Function List

  • Real-time object detection: fast recognition of objects in images or videos with low latency.
  • Custom dataset training: support for tuning models with your own data.
  • Running on edge devices: the model is lightweight and suitable for resource-limited devices.
  • Adjustable resolution: users can balance inspection speed and accuracy.
  • Pre-training model support: provides pre-trained weights based on the COCO dataset.
  • Video stream processing: can analyze the video in real time and output the results.
  • ONNX Export: Supports conversion to ONNX format for easy cross-platform deployment.
  • Multi-GPU training: You can accelerate the training process with multiple graphics cards.

 

Using Help

The use of RF-DETR is divided into three parts: installation, inference and training. Below are detailed steps to help you get started quickly.

Installation process

  1. environmental preparation
    Requires Python 3.9 or higher, and PyTorch 1.13.0 or higher. If using a GPU, run nvidia-smi Check the drive.

    • Install PyTorch:
      pip install torch>=1.13.0 torchvision>=0.14.0
      
    • Download code:
      git clone https://github.com/roboflow/rf-detr.git
      cd rf-detr
      
    • Install the dependencies:
      pip install rfdetr
      

      This will automatically install numpy,supervision and other necessary libraries.

  2. Verify Installation
    Run the following code:

    from rfdetr import RFDETRBase
    print("安装成功")

If no errors are reported, the installation is complete.

inference operation

RF-DETR comes with a pre-trained model of the COCO dataset to detect images or videos directly.

  1. image detection
    • Sample code:
      import io
      import requests
      from PIL import Image
      from rfdetr import RFDETRBase
      import supervision as sv
      model = RFDETRBase()
      url = "https://media.roboflow.com/notebooks/examples/dog-2.jpeg"
      image = Image.open(io.BytesIO(requests.get(url).content))
      detections = model.predict(image, threshold=0.5)
      labels = [f"{class_id} {confidence:.2f}" for class_id, confidence in zip(detections.class_id, detections.confidence)]
      annotated_image = image.copy()
      annotated_image = sv.BoxAnnotator().annotate(annotated_image, detections)
      annotated_image = sv.LabelAnnotator().annotate(annotated_image, detections, labels)
      sv.plot_image(annotated_image)
      
    • This code detects objects in the image, labels the bounding box and confidence level, and then displays the results.
  2. Video Detection
    • first install opencv-python::
      pip install opencv-python
      
    • Sample code:
      import cv2
      from rfdetr import RFDETRBase
      import supervision as sv
      model = RFDETRBase()
      cap = cv2.VideoCapture("video.mp4")  # 替换为你的视频路径
      while cap.isOpened():
      ret, frame = cap.read()
      if not ret:
      break
      image = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
      detections = model.predict(image, threshold=0.5)
      annotated_frame = sv.BoxAnnotator().annotate(frame, detections)
      cv2.imshow("RF-DETR Detection", annotated_frame)
      if cv2.waitKey(1) & 0xFF == ord('q'):
      break
      cap.release()
      cv2.destroyAllWindows()
      
    • This will detect objects in the video frame by frame and display them in real time.
  3. Adjustment of resolution
    • The resolution can be set at initialization (must be a multiple of 56):
      model = RFDETRBase(resolution=560)
      
    • The higher the resolution, the better the accuracy, but it will be slower.

Training customized models

RF-DETR supports fine-tuning with its own dataset, but the dataset needs to be in COCO format, containing train,valid cap (a poem) test Three subdirectories.

  1. Preparing the dataset
    • Example catalog structure:
      dataset/
      ├── train/
      │   ├── _annotations.coco.json
      │   ├── image1.jpg
      │   └── image2.jpg
      ├── valid/
      │   ├── _annotations.coco.json
      │   ├── image1.jpg
      │   └── image2.jpg
      └── test/
      ├── _annotations.coco.json
      ├── image1.jpg
      └── image2.jpg
      
    • COCO format datasets can be generated using the Roboflow platform:
      from roboflow import Roboflow
      rf = Roboflow(api_key="你的API密钥")
      project = rf.workspace("rf-100-vl").project("mahjong-vtacs-mexax-m4vyu-sjtd")
      dataset = project.version(2).download("coco")
      
  2. Start training
    • Sample code:
      from rfdetr import RFDETRBase
      model = RFDETRBase()
      model.train(dataset_dir="./mahjong-vtacs-mexax-m4vyu-sjtd-2", epochs=10, batch_size=4, grad_accum_steps=4, lr=1e-4)
      
    • For training, the recommended total batch size (batch_size * grad_accum_stepsFor example, the A100 GPUs use the batch_size=16, grad_accum_steps=1T4 GPUs batch_size=4, grad_accum_steps=4The
  3. Multi-GPU Training
    • establish main.py Documentation:
      from rfdetr import RFDETRBase
      model = RFDETRBase()
      model.train(dataset_dir="./dataset", epochs=10, batch_size=4, grad_accum_steps=4, lr=1e-4)
      
    • Runs in the terminal:
      python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py
      
    • commander-in-chief (military) 8 Replace with the number of GPUs you are using. Note the adjustment batch_size to keep the total batch size stable.
  4. Load training results
    • Two weight files are generated after training: regular weights and EMA weights (more stable). Loading method:
      model = RFDETRBase(pretrain_weights="./output/model_ema.pt")
      detections = model.predict("image.jpg")
      

ONNX Export

  • Export to ONNX format for easy deployment on other platforms:
    from rfdetr import RFDETRBase
    model = RFDETRBase()
    model.export()
    
  • The exported file is saved in the output Directory for optimized reasoning for edge devices.

 

application scenario

  1. automatic driving
    RF-DETR detects vehicles and pedestrians on the road in real time. Its low latency and high accuracy are suitable for embedded systems.
  2. industrial quality control
    RF-DETR quickly identifies part defects on factory assembly lines. The model is lightweight and can be run directly on the equipment.
  3. video surveillance
    RF-DETR processes surveillance video to detect abnormal objects or behavior in real time. It supports video streaming and is suitable for 24/7 security.

 

QA

  1. What dataset formats are supported?
    Only the COCO format is supported. The dataset needs to contain train,valid cap (a poem) test subdirectories, each with a corresponding _annotations.coco.json Documentation.
  2. How to get Roboflow API key?
    Log in to https://app.roboflow.com, find the API key in your account settings, copy it and set it to the environment variable ROBOFLOW_API_KEYThe
  3. How long does the training take?
    Depends on hardware and dataset size. On a T4 GPU, 10 epochs might take a couple hours. Smaller datasets can be run on a CPU, but it's slow.
May not be reproduced without permission:Chief AI Sharing Circle " RF-DETR: An Open Source Model for Real-Time Visual Object Detection
en_USEnglish