CapsWriter-Offline: Speech Input and Subtitle Transcription Tool for the PC

Latest AI Resources6mos agorelease AI Sharing Circle

1.5K 00

General Introduction

CapsWriter-Offline is a voice input and subtitle transcription tool for PC, hosted on GitHub and built by developer HaujetZhao. It runs completely offline and does not require an Internet connection for speech-to-text and audio/video file-to-subtitle transcription, with unlimited recording time, mixed Chinese and English input, and high accuracy recognition. The software is easy and efficient to operate by pressing the keyboard shortcut (CapsLock by default) to record and releasing it to enter the recognition result automatically. In addition, it can drag audio and video files into the client to generate SRT subtitles, which is suitable for users who need to transcribe quickly.CapsWriter-Offline is open source and free for Windows, MacOS and Linux, and is very popular among users who need to efficiently input and create subtitles.

Function List

speech-to-text input: Press the shortcut key to record and release it to automatically convert voice to text input, supporting mixed Chinese and English content.
Unlimited hours of transcription: Accurate transcription of very long speech content through segmentation recognition and de-duplication techniques.
audio-video transcription with subtitles: Supports dragging audio and video files into the client to automatically generate SRT format subtitles.
hot word replacement: Chinese, English and rule-based hot words can be customized to improve the recognition accuracy of specific words.
Diary Functions: Automatically saves recording results as Markdown files and organizes recordings by date.
Keyword Diary: Recognizes speech that begins with a specific keyword and saves it as a separate thematic Markdown file.
High-quality recording preservation: Supports recording at 48000 samples and saving to MP3 format with FFmpeg.
Cross-platform supportCompatible with Windows, MacOS, and Linux systems to meet the needs of multiple scenarios.

Using Help

Installation process

CapsWriter-Offline is open source software that users need to download from GitHub and install manually. Below are the detailed steps:

1. Downloading software

Visit the GitHub page.
Select the appropriate version for your system in the "Releases" section:
- Windows 10 and above 64-bit systems: Download CapsWriter-Offline-Windows-64bit.zip(both server-side and client-side) and models.zip(model file).
- Windows 7 and above 32-bit systems: Download CapsWriter-Offline-Windows-32bit-Client.zip(Client only, need to connect to other servers on the LAN).
- MacOS/Linux: You need to compile from the source code by yourself, or refer to the packaged version provided by the community.
Unzip the file after the download is complete and place the models.zip Unzip it and put it in the software directory under models Folder.

2. Environmental preparation

Windows user::
- Ensure that your system is Windows 10 or above (required on the server side), with at least 4GB of RAM (64-bit systems).
- If you want to record in MP3 format, you need to install FFmpeg and configure environment variables.
MacOS Users::
- mounting protobuf(running) brew install protobuf).
- The client needs to start with the sudo permission to run, the default shortcut is Right Shift.
Linux users::
- mounting xclip(running) sudo apt-get install xclip) to support clipboard functionality.

3. Running the software

server-side: Unzip and double-click start_server.exe(Windows) or run core_server.py(requires Python 3.8-3.10 and dependencies). The model is loaded after startup (takes up about 2GB of memory and 50 seconds).
client (computing): double-click start_client.exe(Windows) or run core_client.py(MacOS/Linux required) sudo). It listens to the default microphone and shortcuts when launched.

Main Functions

speech-to-text input

Starting the client: After running the client, the software listens to the CapsLock key by default (Right Shift for MacOS).
recording operation::
- Press and hold the CapsLock key to start recording (recordings less than 0.3 seconds are ignored).
- After releasing the key, the software automatically converts speech to text and inputs it to the current cursor position.
Adjustment of settings::
- exist config.py Modify the shortcut keys in the file (shortcut), whether to paste the output (paste) and other parameters.
- To restore the CapsLock state, set the restore_key set up as TrueThe

audio-video transcription with subtitles

Prepare the document: Ensure that the client is running and the server is working properly.
Drag and drop files: Drag audio and video files (e.g. MP4, WAV) to the start_client.exe Up.
Generate Subtitles: The software automatically recognizes the audio content and generates an SRT file, which is saved in the same directory.
caveat: If the file is large, it is recommended to check the memory and hard disk space beforehand, the recognition time is related to the file length.

hot word replacement

Edit hotword file: In the software directory, find the hot-zh.txt(in Chinese),hot-en.txt(English),hot-rule.txt(customized rules).
Add Hotword::
- One Chinese hot word per line (e.g. "artificial intelligence"), based on pinyin substitution.
- English hot words, one per line (e.g. "AI"), based on spelling substitution.
- Custom rules are separated by an equal sign (e.g. "milliampere hour = mAh").
Mode of entry into force: No reboot is required, and the client dynamically loads hot words to improve terminology recognition.

Diary Functions

Enabling Diary: Enabled by default, the results of each recording are saved to the 年份/月份/日期.md Documentation.
Recording preservation: Audio files are automatically deposited 年份/月份/assets folder that supports WAV or MP3 format.
Keyword Diary::
- compiler keywords.txtAdd a keyword (e.g., "meeting") to each line.
- When the voice begins with a keyword, the result is saved separately as a 年份/月份/关键词-日期.mdThe
Clear redundancies: Run the included Python script to remove audio files not referenced by Markdown.

Operation flow demonstration

Scenario 1: Quickly Entering Notes
Open client -> press and hold CapsLock -> say "meeting this afternoon to discuss project progress" -> release key -> text is automatically typed into document -> save as diary file.
Scenario 2: Video to Subtitle
Prepare MP4 file -> Drag to client -> Wait for processing (progress is shown in terminal) -> Check generated SRT file -> Import to video editing software for use.

caveat

If the server is not started, the client will show a connection error, you need to make sure that the server is running on the 127.0.0.1:6016(default address).
MacOS users will need to allow microphone privileges, and will be able to access the microphone from the terminal as a sudo Run the client.
Too many hot words may increase the latency of 3ms/ 10,000 entries, so it is recommended to streamline the commonly used words.