ER NeRF: Building a video synthesis system for talking heads with high fidelity

Latest AI Resources7mos agoupdate AI Sharing Circle

1.9K 00

General Introduction

ER-NeRF (Efficient Region-Aware Neural Radiance Fields) is an open source talking character synthesis system presented at ICCV 2023. The project utilizes the Region-Aware Neural Radiance Fields technique to efficiently generate high-fidelity videos of speaking characters. The main features of the system are a regionalized processing scheme that models the head and torso of the character separately, and an innovative audio-space decomposition technique that enables more accurate lip synchronization. The project provides complete training and inference code, supports customized training videos, and can use different audio feature extractors (e.g., DeepSpeech, Wav2Vec, HuBERT, etc.) to process the audio input. The system achieves significant improvements in both visual quality and computational efficiency, providing an important technical solution in the field of talking character synthesis.

New project: https://github.com/Fictionarry/TalkingGaussian

Function List

High fidelity video compositing of talking heads
Neural radiation field rendering for area perception
Supports separate head and torso modeling
Precise lip synchronization
Multiple audio feature extraction support (DeepSpeech/Wav2Vec/HuBERT)
Customized Video Training Support
Audio-driven character animation generation
Smooth head movement control
Blink motion support (AU45 feature)
LPIPS fine-tuning optimization function

Using Help

1. Environmental configuration

System operating environment requirements:

Ubuntu 18.04 operating system
PyTorch version 1.12
CUDA 11.3
Installation Steps:

Create a conda environment:

conda create -n ernerf python=3.10
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt

Install additional dependencies:

pip install "git+https://github.com/facebookresearch/pytorch3d.git"
pip install tensorflow-gpu==2.8.0

2. Preparation of pre-processing models

The following model files need to be downloaded and prepared:

Face parsing model
3DMM Head Attitude Estimation Model
Basel Face Model 2009

3. Customizing the video training process

Video Preparation Requirements:
- Format: MP4
- Frame rate: 25FPS
- Resolution: 512x512 recommended
- Duration: 1-5 minutes
- Requires each frame to contain speaking characters
Data preprocessing:

python data_utils/process.py data/<ID>/<ID>.mp4

Audio feature extraction (one of three options):

DeepSpeech feature extraction:

python data_utils/deepspeech_features/extract_ds_features.py --input data/<n>.wav

Wav2Vec feature extraction:

python data_utils/wav2vec.py --wav data/<n>.wav --save_feats

HuBERT feature extraction (recommended):

python data_utils/hubert.py --wav data/<n>.wav

4. Model training

The training is divided into two phases: head training and trunk training:

Head training:

python main.py data/obama/ --workspace trial_obama/ -O --iters 100000
python main.py data/obama/ --workspace trial_obama/ -O --iters 125000 --finetune_lips --patch_size 32

Torso Training:

python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --head_ckpt <head>.pth --iters 200000

5. Model testing and inference

Test modeling effects:

# 仅渲染头部
python main.py data/obama/ --workspace trial_obama/ -O --test
# 渲染头部和躯干
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test

Reasoning with target audio:

python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test --test_train --aud <audio>.npy

Tip: Adding the --smooth_path parameter reduces head jitter, but may reduce attitude accuracy.

The article is copyrighted and should not be reproduced without permission.

Novelcrafter：专业小说创作工具，利用AI在创作各阶段提供构思和到成书

Novelcrafter: a professional novel creation tool that uses AI to provide ideas at all stages of creation and through to book completion

Latest AI Resources # AI Writing

11 months ago

02.3K

iSlide: make PPT design simple, PPT plug-in, PPT template download platform

Latest AI Resources # AI Generated Presentation/PPT

8 months ago

01.7K

AI Math Solver：AI数学求解器，解决复杂数学问题，提供详细步骤解析

AI Math Solver: AI math solver that solves complex math problems with detailed step-by-step explanations

Latest AI Resources # AI Educational Tools

8 months ago

01.7K

Vexa: a real-time meeting transcription and intelligent knowledge extraction tool

Latest AI Resources # AI Java Open Source Projecct # AI Text and Audio/Video Summarization Tool # AI Speech to Text

4 months ago

01.2K

No comments

You must be logged in to leave a comment!

No comments...

ER NeRF: Building a video synthesis system for talking heads with high fidelity

General Introduction

Function List

Using Help

1. Environmental configuration

2. Preparation of pre-processing models

3. Customizing the video training process

4. Model training

5. Model testing and inference

GitHub Copilot: AI Programming Assistant Integrated for Use in Visual Studio Code

FoleyCrafter: Adding Vivid, Synchronized Sound to Silent Video

Related articles

Novelcrafter: a professional novel creation tool that uses AI to provide ideas at all stages of creation and through to book completion

iSlide: make PPT design simple, PPT plug-in, PPT template download platform

AI Math Solver: AI math solver that solves complex math problems with detailed step-by-step explanations

Vexa: a real-time meeting transcription and intelligent knowledge extraction tool

No comments

Latest Collections

Latest Articles

ER NeRF: Building a video synthesis system for talking heads with high fidelity

General Introduction

Function List

Using Help

1. Environmental configuration

2. Preparation of pre-processing models

3. Customizing the video training process

4. Model training

5. Model testing and inference

GitHub Copilot: AI Programming Assistant Integrated for Use in Visual Studio Code

FoleyCrafter: Adding Vivid, Synchronized Sound to Silent Video

Related articles

Novelcrafter: a professional novel creation tool that uses AI to provide ideas at all stages of creation and through to book completion

iSlide: make PPT design simple, PPT plug-in, PPT template download platform

AI Math Solver: AI math solver that solves complex math problems with detailed step-by-step explanations

Vexa: a real-time meeting transcription and intelligent knowledge extraction tool

No comments

Selected AI Tools

Latest Collections

Latest Articles