General Introduction
Video Analyzer is a comprehensive video analysis tool that combines computer vision, audio transcription, and natural language processing techniques to generate detailed video content descriptions. The tool helps users better understand and analyze video content by extracting key frames from the video, transcribing audio content, and generating natural language descriptions. The video analytics tool can run completely locally, without cloud services or API keys, or it can leverage any OpenAI API-compatible service for speed and scale.
Function List
- Video Frame Extraction: Automatically recognizes and extracts key frames from videos.
- audio transcription: Transcription of audio content using the Whisper model.
- natural language description: Convert extracted frames and transcribed audio content into natural language descriptions.
- Multi-model support: Support for analysis using different large-scale language models (e.g., the Ollama Vision model).
- output result: Generate a JSON file containing the analysis results for further use or review.
Using Help
Installation process
To use the video analysis tool, you first need to install some necessary software and libraries:
- clone warehouse::
- Use Git to clone a project repository on GitHub:
git clone https://github.com/byjlw/video-analyzer.git cd video-analyzer
- Use Git to clone a project repository on GitHub:
- Creating a Virtual Environment::
- To avoid environment conflicts, it is recommended to create a new virtual environment:
python3 -m venv .venv source .venv/bin/activate # Using .venv\Scripts\activate on Windows
- To avoid environment conflicts, it is recommended to create a new virtual environment:
- Installation of dependencies::
- Install all Python packages required for the project:
pip install .
- Or if you wish to install it in development mode, you can use:
pip install -e .
- Install all Python packages required for the project:
- Configuring FFmpeg::
- Make sure you have FFmpeg installed on your system for video and audio processing.
Using video analysis tools
- operational analysis::
- The most basic way to use it is to specify the video file directly:
video-analyzer path/to/video.mp4
- You can pass more parameters to customize the analysis process:
video-analyzer video.mp4 --config custom_config.json --output . /custom_output --frames-per-minute 15 --duration 60
- Parameter Description:
--config
: Specifies the configuration file path.--output
: Set the output path.--frames-per-minute
: Set the number of frames extracted per minute.--duration
: Limit the length of video analyzed in seconds.
- The most basic way to use it is to specify the video file directly:
- Outcome of the process::
- After the analysis is complete, the tool generates a
analysis.json
file, which contains the results of the analysis of each keyframe and a textual description of the audio transcription.
- After the analysis is complete, the tool generates a
- Advanced Configuration::
- You can use custom profiles to set more detailed parameters, such as choosing a specific Whisper model size (tiny, base, small, medium, large), setting the threshold for language detection, or deciding whether to keep the extracted frame images or not.
Functional operation flow
- frame analysis::
- The tool extracts key frames from the video at a set rate and then performs a computer vision analysis of each frame, recording its timestamp and analysis results.
- audio processing::
- The audio is separated from the video and transcribed through the Whisper model. The transcription is combined with information from the video frames to generate a more complete description of the video.
- Generate a description::
- The tool combines all analyzed data into a coherent video description, using previously extracted frames and audio transcriptions. This part of the functionality utilizes natural language processing techniques to make the description more readable.
Using this tool helps users to quickly understand the video content, especially in scenarios where a large number of videos are processed or where a video summary needs to be automatically generated, which greatly improves efficiency.