General Introduction
AI no jimaku gumi (AI no subtitle group) is a powerful command line video subtitle processing tool focused on implementing automated video subtitle extraction, transcription and translation functions. The tool integrates advanced AI technologies, including Whisper Speech recognition models and a variety of translation backends (e.g. DeepL, LLM, etc.) enable efficient processing of video and audio content and generation of high-quality subtitle files. It supports conversion between multiple languages, including English, Japanese, Chinese, Korean and other mainstream languages, and provides flexible subtitle output options. As an open source project, it not only provides the complete source code, but also supports cross-platform use and can run on Linux, macOS and other major operating systems.
Function List
- Automatically extracts audio content from video and recognizes speech
- Supports multiple subtitle sources: audio recognition, container extraction, OCR recognition
- Integration with multiple translation backends: DeepL, LLM, etc.
- Support for translation from and to many mainstream languages
- Configurable subtitle output format (SRT format currently supported)
- Support video clip interception and processing
- Provides debugging modes: audio extraction only, transcription only, translation only, and other options
- Support for customizing AI model paths and configurations
- Cross-platform support (Linux, macOS, Windows to be supported)
Using Help
1. Environmental preparation
Windows systems in preparation...
Linux system installation dependencies:
- Ubuntu users:
apt-get install -y clang cmake make pkg-config libavcodec-dev libavdevice-dev libavfilter-dev libavformat-dev libavutil-dev libpostproc-dev libswresample-dev libswscale-dev
- Fedora users:
dnf install clang cmake ffmpeg-free-devel make pkgconf-pkg-config
- Arch Linux users:
pacman -S clang cmake ffmpeg make pkgconf
macOS system installation dependencies:
Use the Homebrew package manager:
brew install cmake ffmpeg
2. Installation steps
- Clone the code repository:
git clone https://github.com/Inokinoki/ai-no-jimaku-gumi.git
cd ai-no-jimaku-gumi
- Compile the project:
cargo build
- Download the Whisper model:
wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin
3. Basic use
The tool offers several configuration options:
--input-video-path
: Specify the input video file path (required)--source-language
: Source language (default: ja)---target-language
: Target language (default: en)--ggml-model-path
: AI model path (default: ggml-tiny.bin)--subtitle-output-path
: Subtitle output path (default: output.srt)
4. Translation back-end configuration
DeepL translation backend (default):
- Setting environment variables:
export DEEPL_API_KEY=Your API key
export DEEPL_API_URL=https://api.deepl.com # Required for paid API version
LLM Translation Backend:
- Setting environment variables:
export CUSTOM_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxx
- Example of use:
. /target/debug/ainojimakugumi --input-video-path video.webm \
---translator-backend llm \
--llm-api-base https://your-api-endpoint.com/v1/ \
--llm-prompt 'translate this to English' \
--llm-model-name 'gpt-4o-mini' \
--ggml-model-path ggml-small.bin
5. Advanced functions
- utilization
--start-time
cap (a poem)--end-time
Can process specific video clips --only-extract-audio
: Extract audio only (for debugging)--only-transcript
: Generate subtitles in the original language only--only-translate
: Performs translation functions only- Supports multiple subtitle source selection: audio (default), container, ocr