General Introduction
CapsWriter-Offline is a voice input and subtitle transcription tool for PC, hosted on GitHub and built by developer HaujetZhao. It runs completely offline and does not require an Internet connection for speech-to-text and audio/video file-to-subtitle transcription, with unlimited recording time, mixed Chinese and English input, and high accuracy recognition. The software is easy and efficient to operate by pressing the keyboard shortcut (CapsLock by default) to record and releasing it to enter the recognition result automatically. In addition, it can drag audio and video files into the client to generate SRT subtitles, which is suitable for users who need to transcribe quickly.CapsWriter-Offline is open source and free for Windows, MacOS and Linux, and is very popular among users who need to efficiently input and create subtitles.
Function List
- speech-to-text input: Press the shortcut key to record and release it to automatically convert voice to text input, supporting mixed Chinese and English content.
- Unlimited hours of transcription: Accurate transcription of very long speech content through segmentation recognition and de-duplication techniques.
- audio-video transcription with subtitles: Supports dragging audio and video files into the client to automatically generate SRT format subtitles.
- hot word replacement: Chinese, English and rule-based hot words can be customized to improve the recognition accuracy of specific words.
- Diary Functions: Automatically saves recording results as Markdown files and organizes recordings by date.
- Keyword Diary: Recognizes speech that begins with a specific keyword and saves it as a separate thematic Markdown file.
- High-quality recording preservation: Supports recording at 48000 samples and saving to MP3 format with FFmpeg.
- Cross-platform supportCompatible with Windows, MacOS, and Linux systems to meet the needs of multiple scenarios.
Using Help
Installation process
CapsWriter-Offline is open source software that users need to download from GitHub and install manually. Below are the detailed steps:
1. Downloading software
- Visit the GitHub page.
- Select the appropriate version for your system in the "Releases" section:
- Windows 10 and above 64-bit systems: Download
CapsWriter-Offline-Windows-64bit.zip
(both server-side and client-side) andmodels.zip
(model file). - Windows 7 and above 32-bit systems: Download
CapsWriter-Offline-Windows-32bit-Client.zip
(Client only, need to connect to other servers on the LAN). - MacOS/Linux: You need to compile from the source code by yourself, or refer to the packaged version provided by the community.
- Windows 10 and above 64-bit systems: Download
- Unzip the file after the download is complete and place the
models.zip
Unzip it and put it in the software directory undermodels
Folder.
2. Environmental preparation
- Windows user::
- Ensure that your system is Windows 10 or above (required on the server side), with at least 4GB of RAM (64-bit systems).
- If you want to record in MP3 format, you need to install FFmpeg and configure environment variables.
- MacOS Users::
- mounting
protobuf
(running)brew install protobuf
). - The client needs to start with the
sudo
permission to run, the default shortcut is Right Shift.
- mounting
- Linux users::
- mounting
xclip
(running)sudo apt-get install xclip
) to support clipboard functionality.
- mounting
3. Running the software
- server-side: Unzip and double-click
start_server.exe
(Windows) or runcore_server.py
(requires Python 3.8-3.10 and dependencies). The model is loaded after startup (takes up about 2GB of memory and 50 seconds). - client (computing): double-click
start_client.exe
(Windows) or runcore_client.py
(MacOS/Linux required)sudo
). It listens to the default microphone and shortcuts when launched.
Main Functions
speech-to-text input
- Starting the client: After running the client, the software listens to the CapsLock key by default (Right Shift for MacOS).
- recording operation::
- Press and hold the CapsLock key to start recording (recordings less than 0.3 seconds are ignored).
- After releasing the key, the software automatically converts speech to text and inputs it to the current cursor position.
- Adjustment of settings::
- exist
config.py
Modify the shortcut keys in the file (shortcut
), whether to paste the output (paste
) and other parameters. - To restore the CapsLock state, set the
restore_key
set up asTrue
The
- exist
audio-video transcription with subtitles
- Prepare the document: Ensure that the client is running and the server is working properly.
- Drag and drop files: Drag audio and video files (e.g. MP4, WAV) to the
start_client.exe
Up. - Generate Subtitles: The software automatically recognizes the audio content and generates an SRT file, which is saved in the same directory.
- caveat: If the file is large, it is recommended to check the memory and hard disk space beforehand, the recognition time is related to the file length.
hot word replacement
- Edit hotword file: In the software directory, find the
hot-zh.txt
(in Chinese),hot-en.txt
(English),hot-rule.txt
(customized rules). - Add Hotword::
- One Chinese hot word per line (e.g. "artificial intelligence"), based on pinyin substitution.
- English hot words, one per line (e.g. "AI"), based on spelling substitution.
- Custom rules are separated by an equal sign (e.g. "milliampere hour = mAh").
- Mode of entry into force: No reboot is required, and the client dynamically loads hot words to improve terminology recognition.
Diary Functions
- Enabling Diary: Enabled by default, the results of each recording are saved to the
年份/月份/日期.md
Documentation. - Recording preservation: Audio files are automatically deposited
年份/月份/assets
folder that supports WAV or MP3 format. - Keyword Diary::
- compiler
keywords.txt
Add a keyword (e.g., "meeting") to each line. - When the voice begins with a keyword, the result is saved separately as a
年份/月份/关键词-日期.md
The
- compiler
- Clear redundancies: Run the included Python script to remove audio files not referenced by Markdown.
Operation flow demonstration
- Scenario 1: Quickly Entering Notes
Open client -> press and hold CapsLock -> say "meeting this afternoon to discuss project progress" -> release key -> text is automatically typed into document -> save as diary file. - Scenario 2: Video to Subtitle
Prepare MP4 file -> Drag to client -> Wait for processing (progress is shown in terminal) -> Check generated SRT file -> Import to video editing software for use.
caveat
- If the server is not started, the client will show a connection error, you need to make sure that the server is running on the
127.0.0.1:6016
(default address). - MacOS users will need to allow microphone privileges, and will be able to access the microphone from the terminal as a
sudo
Run the client. - Too many hot words may increase the latency of 3ms/ 10,000 entries, so it is recommended to streamline the commonly used words.