AI Personal Learning
and practical guidance

VideoRAG: A RAG framework for understanding ultra-long videos with support for multimodal retrieval and knowledge graph construction

General Introduction

VideoRAG is a retrieval-enhanced generative framework designed for processing and understanding very long contextual videos. The tool combines a graph-driven textual knowledge base with hierarchical multimodal context encoding to efficiently process hundreds of hours of video content on a single NVIDIA RTX 3090 GPU. videoRAG maintains consistency across video semantics and optimizes retrieval efficiency by dynamically constructing a knowledge graph. Developed by the Department of Data Science at the University of Hong Kong, the project aims to provide users with a powerful tool to process complex video data.

VideoRAG: A RAG Framework for Understanding Ultra-Long Videos with Support for Multimodal Retrieval and Knowledge Graph Construction-1


 

Function List

  • Efficient handling of very long contextual videos: Process hundreds of hours of video content with a single NVIDIA RTX 3090 GPU.
  • Structured Video Knowledge Index: Distill hundreds of hours of video content into a concise knowledge graph.
  • multimodal search: Combine textual semantics and visual content to identify the most relevant videos to provide a comprehensive response.
  • Newly created LongerVideos benchmark: Contains over 160 videos totaling 134 hours of lectures, documentaries and entertainment.
  • dual-channel architecture: Combining a graph-driven textual knowledge base and hierarchical multimodal context encoding to maintain cross-video semantic consistency.

 

Using Help

Installation process

  1. Create and activate the conda environment:
   conda create --name videorag python=3.11
conda activate videorag
  1. Install the necessary Python packages:
   pip install numpy==1.26.4 torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2
pip install accelerate==0.30.1 bitsandbytes==0.43.1 moviepy==1.0.3
pip install git+https://github.com/facebookresearch/pytorchvideo.git@28fe037d212663c6a24f373b94cc5d478c8c1a1d
pip install timm==0.6.7 ftfy regex einops fvcore eva-decord==0.6.1 iopath matplotlib types-regex cartopy
pip install ctranslate2==4.4.0 faster_whisper neo4j hnswlib xxhash nano-vectordb
pip install transformers==4.37.1 tiktoken openai tenacity
  1. Install ImageBind:
   cd ImageBind
pip install .
  1. Download the necessary checkpoint files:
   git clone https://huggingface.co/openbmb/MiniCPM-V-2_6-int4
git clone https://huggingface.co/Systran/faster-distil-whisper-large-v3
mkdir .checkpoints
cd .checkpoints
wget https://dl.fbaipublicfiles.com/imagebind/imagebind_huge.pth
cd ...

Usage Process

  1. Video Knowledge Extraction: Multiple videos are fed into VideoRAG and the system automatically extracts and builds a knowledge graph.
  2. Query Response: Users can enter a query and VideoRAG will provide a comprehensive response based on the constructed knowledge graph and multimodal search mechanism.
  3. Multi-language support: Currently VideoRAG has only been tested in English environment, if you need to deal with multi-language video, it is recommended to modify the WhisperModel in asr.py.

Main Functions

  • Video Upload: Upload video files to the system, which will automatically process and extract knowledge.
  • Query Input: Enter a question in the query box and the system will provide a detailed answer based on the knowledge graph and multimodal search mechanism.
  • Results Showcase: The system displays relevant video clips and text responses that users can click on to view details.
CDN
May not be reproduced without permission:Chief AI Sharing Circle " VideoRAG: A RAG framework for understanding ultra-long videos with support for multimodal retrieval and knowledge graph construction

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish