AI no jimaku gumi: Automatic generation and translation of multilingual subtitles for videos with the help of AI

Latest AI Resources7mos agorelease AI Sharing Circle

1.6K 00

General Introduction

AI no jimaku gumi (AI no subtitle group) is a powerful command line video subtitle processing tool focused on implementing automated video subtitle extraction, transcription and translation functions. The tool integrates advanced AI technologies, including Whisper Speech recognition models and a variety of translation backends (e.g. DeepL, LLM, etc.) enable efficient processing of video and audio content and generation of high-quality subtitle files. It supports conversion between multiple languages, including English, Japanese, Chinese, Korean and other mainstream languages, and provides flexible subtitle output options. As an open source project, it not only provides the complete source code, but also supports cross-platform use and can run on Linux, macOS and other major operating systems.

Function List

Automatically extracts audio content from video and recognizes speech
Supports multiple subtitle sources: audio recognition, container extraction, OCR recognition
Integration with multiple translation backends: DeepL, LLM, etc.
Support for translation from and to many mainstream languages
Configurable subtitle output format (SRT format currently supported)
Support video clip interception and processing
Provides debugging modes: audio extraction only, transcription only, translation only, and other options
Support for customizing AI model paths and configurations
Cross-platform support (Linux, macOS, Windows to be supported)

Using Help

1. Environmental preparation

Windows systems in preparation...

Linux system installation dependencies:

Ubuntu users:

apt-get install -y clang cmake make pkg-config libavcodec-dev libavdevice-dev libavfilter-dev libavformat-dev libavutil-dev libpostproc-dev libswresample-dev libswscale-dev

Fedora users:

dnf install clang cmake ffmpeg-free-devel make pkgconf-pkg-config

Arch Linux users:

pacman -S clang cmake ffmpeg make pkgconf

macOS system installation dependencies:

Use the Homebrew package manager:

brew install cmake ffmpeg

2. Installation steps

Clone the code repository:

git clone https://github.com/Inokinoki/ai-no-jimaku-gumi.git
cd ai-no-jimaku-gumi

Compile the project:

cargo build

Download the Whisper model:

wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin

3. Basic use

The tool offers several configuration options:

--input-video-path: Specify the input video file path (required)
--source-language: Source language (default: ja)
--target-language: Target language (default: en)
--ggml-model-path: AI model path (default: ggml-tiny.bin)
--subtitle-output-path: Subtitle output path (default: output.srt)

4. Translation back-end configuration

DeepL translation backend (default):

Setting environment variables:

export DEEPL_API_KEY=你的API密钥
export DEEPL_API_URL=https://api.deepl.com  # 付费API版本需要

LLM Translation Backend:

Setting environment variables:

export CUSTOM_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxx

Example of use:

./target/debug/ainojimakugumi --input-video-path video.webm \
--translator-backend llm \
--llm-api-base https://your-api-endpoint.com/v1/ \
--llm-prompt 'translate this to English' \
--llm-model-name 'gpt-4o-mini' \
--ggml-model-path ggml-small.bin

5. Advanced functions

utilization--start-timecap (a poem)--end-timeCan process specific video clips
--only-extract-audio: Extract audio only (for debugging)
--only-transcript: Generate subtitles in the original language only
--only-translate: Performs translation functions only
Supports multiple subtitle source selection: audio (default), container, ocr