BiliNote: The AI tool that automatically generates Markdown notes from videos

Latest AI Resources4mos agorelease AI Sharing Circle

General Introduction

BiliNote is an open source AI video note-taking tool that supports extracting content from BiliBili and YouTube video links to automatically generate clearly structured notes in Markdown format. It uses local audio transcription and a variety of big models (such as OpenAI, DeepSeek, Qwen) for content summarization , support for inserting video screenshots and timestamp jump links. The project is hosted on GitHub under the MIT license, and is available in a Docker deployment and a Windows packaged version for students, creators, and researchers to organize materials for study or work. The official online experience is deployed on Cloudflare Pages, which may be slow to access due to network conditions.

Function List

Automatically extracts content from Beep and YouTube video links to generate Markdown notes.
Native audio transcription using Fast-Whisper models with privacy support.
Support OpenAI, DeepSeek, Qwen and other big models to summarize the core content of the video.
Optional insertion of video keyframe screenshots to enhance note visualization.
Generate timestamped notes with support for jumping to the corresponding point in time of the original video.
Provide task logging function, you can look back at the history of notes to generate records.
Supports Docker one-click deployment to simplify local or cloud installations.
A packaged version (exe file) is available for Windows and does not require complex configuration to use.
There are plans to support more video platforms such as Jitterbug and Shutterbug.

Using Help

Installation and Deployment

BiliNote offers three ways to use it: manual deployment, Docker deployment and Windows packaged version. Below are the detailed steps:

manual deployment

Cloning Project Code
Run the following command to get the source code:

git clone https://github.com/JefferyHcool/BiliNote.git
cd BiliNote
mv .env.example .env

Install FFmpeg
BiliNote relies on FFmpeg for audio processing and must be installed:
- Mac: Run brew install ffmpeg
- Ubuntu/Debian: Run sudo apt install ffmpeg
- Windows (computer): Download and install FFmpeg from the official FFmpeg website, and make sure that the path to the FFmpeg executable is added to the system environment variable PATH.

Configuring the backend
Go to the backend directory, install the dependencies and start the service:

cd backend
pip install -r requirements.txt
python main.py

compiler .env file to configure the API key and port, for example:

API_BASE_URL=http://localhost:8000
OUT_DIR=note_results
IMAGE_BASE_URL=/static/screenshots
MODEL_PROVIDER=openai
OPENAI_API_KEY=sk-xxxxxx
DEEP_SEEK_API_KEY=xxx
QWEN_API_KEY=xxx

Configuring the Front End
Go to the front-end directory, install the dependencies and start the service:
```
cd BiliNote_frontend
pnpm install
pnpm dev
```
interviews http://localhost:5173 View the front-end interface.
Optimized audio transcription (optional)
If you are using an NVIDIA GPU, you can enable the CUDA-accelerated version of Fast-Whisper, see Fast-Whisper Project Configuration.

Docker Deployment

Ensure that Docker and Docker Compose are installed
Refer to the Docker website to install.

Clone and configure the project

git clone https://github.com/JefferyHcool/BiliNote.git
cd BiliNote
mv .env.example .env

Starting services
Run the following command to build and start the container:
```
docker compose up --build
```
The default port is the front end http://localhost:${FRONTEND_PORT} and back-end http://localhost:${BACKEND_PORT}The following is an example of a program that can be used in the .env customized in the file.

Windows Packaged Version

Download exe file
Visit the GitHub Release page to download the Windows package (exe file).
running program
Double click the exe file to start, no need to install FFmpeg or configure environment variables manually. The first time you run it, you need to enter the API key.
Configuring API Keys
Enter the API key for OpenAI, DeepSeek or Qwen in the program interface, save it and use it.

Procedure for use

Visit BiliNote
- Local deployment: open a browser and visit http://localhost:5173The
- Online experience: visit https://www.bilinote.app(possibly due to slow loading of Cloudflare Pages).
- Windows packaged version: Double click on the exe file to start the program.
Enter video link
Enter a link to a publicly available Bleep or YouTube video in the interface, for example https://www.bilibili.com/video/xxxClick "Submit" to begin the process. Click "Submit" to begin the process.
Configuration Generation Options
- AI model: Choose OpenAI, DeepSeek, or Qwen for content summarization.
- Screenshot Insertion: Check whether to automatically insert video screenshots.
- jump link: Choose whether or not to generate a jump link with a timestamp.
- note-taking style: Choose from Academic Style, Spoken Style, or Focused Extraction Mode (some styles are subject to future update support).
Generate notes
After clicking "Generate", BiliNote downloads the video audio, transcribes it to text using Fast-Whisper, and generates Markdown notes using the selected macromodel. The generation time depends on the video length and hardware performance.
Viewing and exporting notes
- Notes are displayed in Markdown format, with headings, paragraphs, timestamps, and screenshots (if enabled).
- Click on the timestamp to jump to the corresponding point in time of the original video.
- Support for exporting to Markdown files, with future plans to support PDF, Word, and Notion Format.
- Historical notes can be viewed on the Task History screen, with support for viewing and editing.

Featured Function Operation

Native Audio Transcription: Fast-Whisper models run locally to protect data privacy. Supports CUDA acceleration for faster transcription.
Multi-model support: Switch between OpenAI, DeepSeek, or Qwen for different languages and scenarios (e.g., Qwen is better for Chinese videos).
Screenshot Insertion: Automatically intercepts video keyframes and inserts them into the corresponding positions of the notes to enhance readability.
Mission history: Each generated task is automatically saved for subsequent review or modification.
Windows Packaged Version: Provide an out-of-the-box experience for non-technical users and simplify the installation process.

caveat

Video links need to be publicly accessible, private videos may not be processed.
The content summarization feature needs to be configured with a valid API key (OpenAI, DeepSeek, or Qwen).
FFmpeg must be installed correctly (except for Windows packages).
The online experience may load slowly due to Cloudflare Pages limitations, so we recommend deploying locally or using the Windows packaged version.
Ensure a stable network to avoid audio downloads or API calls failing.

application scenario

Students organize their online class notes
Students can take Markdown notes from Beep or YouTube videos, extracting key points and time stamps for easy review and orientation.
Content creators organize their material
Creators can extract video scripts or key information to generate notes with screenshots for content curation or copywriting.
Archiving of corporate training content
Enterprises can turn training videos into structured notes for employees to review or archive, improving learning efficiency.
Researchers organize academic lectures
Researchers can turn academic conference videos into notes, extract core ideas and data, and build a knowledge base.
Personal knowledge management
Users can turn videos of interest (e.g., tutorials, podcasts) into notes and save them to their personal knowledge base for access at any time.

QA

What video platforms does BiliNote support?
Currently it supports BeiliBili and YouTube, and in the future it plans to support Jieyin, Shutterstock and other platforms.
What is the difference between a packaged version of Windows and a local deployment?
The Windows packaged version eliminates the need to manually install FFmpeg or configure the environment for non-technical users. Local deployment is more flexible, with support for custom configurations and GPU acceleration.
How can I increase the speed of audio transcription?
For a CUDA-accelerated version using an NVIDIA GPU device with Fast-Whisper enabled, refer to the Fast-Whisper project.
Do I have to use a paid API key?
The content summarization feature requires an API key for OpenAI, DeepSeek, or Qwen (fees may apply). Audio transcription can be run locally for free.
Why is the online experience version loading slowly?
The online version is deployed on Cloudflare Pages and is subject to network and server limitations. Local deployment or Windows packaged version is recommended.