General Introduction
FoleyCrafter is an open source project developed by OpenMMLab to generate vivid and synchronized sound effects for silent videos. The project uses advanced artificial intelligence technology to analyze video content and generate semantically relevant and time-synchronized sound effects to enhance the realism and emotional depth of the video.FoleyCrafter's goal is to provide high-quality sound solutions for movies, games, and other fields to enhance the audience's audiovisual experience.
Automated synthesis of voiceover workflows: https://openart.ai/workflows/t8star/foleycrafter/wZyBSeaa2lvgU3c3NlcH
Function List
- Video to Audio Generation: Generate semantically relevant and synchronized sound effects based on video content.
- Text Alert Sound Generation: Generate scene-specific sound effects from text cues.
- time alignment: Ensure that the generated sound effects are time synchronized with the video content.
- Gradio Interface: Provides a user-friendly interface for sound generation operations.
- open source: A complete code base is provided to facilitate secondary development and customization by developers.
Using Help
Installation process
- Preparing the environment::
- Install the Conda environment:
conda env create -f requirements/environment.yaml
- Activate the environment:
conda activate foleycrafter
- Install Git LFS:
conda install git-lfs
and then rungit lfs install
- Install the Conda environment:
- Download Checkpoints::
- (of a computer) run
inference.py
Automatically download checkpoints, or manually download and place them in thecheckpoints
Catalog.
- (of a computer) run
- Launching the Gradio Interface::
- (of a computer) run
python app.py --share
Launch the Gradio interface.
- (of a computer) run
Usage Process
- Video to Audio Generation::
- (of a computer) run
python inference.py --save_dir=output/sora/
, save the generated audio file in the specified directory.
- (of a computer) run
- time alignment::
- (of a computer) run
python inference.py --temporal_align --input=input/avsync --save_dir=output/avsync/
, ensuring that the generated sound effects are time-synchronized with the video content.
- (of a computer) run
- Text Alert Sound Generation::
- (of a computer) run
python inference.py --input=input/PromptControl/case1/ --seed=10201304011203481429 --prompt='noisy, people talking' --save_dir=output/ PromptControl/case1_prompt/
The sound effects are generated for specific scenes based on textual cues.
- (of a computer) run
Detailed steps
- Preparing the environment::
- Download and install Conda: https://docs.conda.io/en/latest/miniconda.html
- Clone the project code:
git clone https://github.com/open-mmlab/foleycrafter.git
- Go to the project catalog:
cd foleycrafter
- Follow the steps above to install the dependencies and configure the environment.
- Download Checkpoints::
- Download and place the checkpoint file, making sure the directory structure is as follows:
└── checkpoints ├── semantic │ ├── semantic_adapter.bin ├── vocoder │ ├── vocoder.pt │ ├── config.json ├─ temporal_adapter.ckpt │ └── timestamp_detector.pth.tar
- Launching the Gradio Interface::
- (of a computer) run
python app.py --share
Launches the Gradio interface, which can be accessed by the user through a browser for operation.
- (of a computer) run
- Generate sound effects::
- Select different generation modes (video to audio, time alignment, text cue) as needed, and run the appropriate commands to generate sound files.
With the above steps, users can easily get started with FoleyCrafter to add vivid and synchronized sound effects to silent videos to enhance the audio-visual experience. /n