AI Personal Learning
and practical guidance

Wav2Lip-based visual manipulation version of the Digital Man Integration Pack

Hello everyone, today I'm sharing a digital person creation tool! It is easy to use and supports batch processing. (Integration package at the end of the article to take their own) I believe that we have learned something about the technology of digital people, before the fire Guo Degang speak English, Russian beauty speak Chinese, etc. are the embodiment of digital people technology.

There are actually many kinds of digital people, for example, the one I shared is the video form of digital people, then there are also 3D model digital people made with UnrealEngine, and they are all applied in different places. Interested in can go to understand, here but not too much to explain.


What? You don't know what a digital person is? ("Baidu)

That said, the one shared today is actually in the original Wav2Lip Project based on the optimization, and deployment down I also found a lot of problems, such as caching, interface, execution efficiency issues, and so on, and targeted optimization.

 

Configuration Requirements

Windows (computer)

N-card must be N-card! CPU is not supported!

MAC

In development, still working out the mps issues! Been trying for days! So MAC folks wait a little longer?

Friends really don't think I'm slow, I do a lot of testing and seeing if there's anything I can optimize with each integration pack once it's done!

 

Updates

What's New Compared to the Original

1. Added webui interface.

2. Support batch processing.

3. Optimized the original cache problem.

4. Optimized processing efficiency issues.

 

Usage

intend

Audio and video files need to be prepared.

audio file (computer)::

  • It is recommended that the audio length be the same length as the video (e.g. if you are a 10 second video, then your audio length is recommended to be 10 seconds. If the audio length is longer than the video length, the video will automatically loop backward to extend it).
  • Audio file format: wav and mp3

video file::

  • The video frames you select must all have a face in them or an error will be reported. (For example, if your video is 10 seconds long in total, and there are 2 seconds in the middle with no face in the frame, it will report an error)
  • Recommended H264 encoded mp4 video format

Tip: This version supports batch. Batch supports multiple videos with multiple audio, multiple videos with single audio.

An example:

  • You have 3 videos and 3 pieces of audio, then it will be processed in the order you choose video 1 corresponding to audio 1 and video 2 corresponding to audio 2.
  • You have 3 videos and 1 audio, then it will be processed as if all the videos you uploaded correspond to this audio. Video 1 corresponds to audio 1, video 2 corresponds to audio 1, and video 3 corresponds to audio 3.

Start processing

The easiest way:

Drag and drop the video and audio into the corresponding file boxes, click Start Generation, and finish!

If you want to dig deeper into what each parameter does, read on!

 

Parameter details

Video Quality:

Fast and Quick: Wav2Lip audio transcoding type mode.

Improved: Wav2Lip audio to lip mode + Lips with mask feathering around the lips to remove the border around the lips.

Enhanced: Wav2Lip Audio to Lip Mode + Mask Feathering + GFPGAN HD Face Enhancement

Experimental:Optimizing execution efficiency on an enhanced model.

Recommended by default if the machine is not too poorly configured.Enhancedcap (a poem)Experimental

Resolution Options

full resolution

half resolution

Attention:

Tested down half-resolution will be in some cases there are incompatible problems, it is recommended that this option to choose full resolution

Wav2Lip Version Options

Wav2Lip

Benefits: more accurate mouth synchronization, keeps mouth closed when no sound is heard.

Disadvantages: sometimes produces missing teeth (in some cases).

Wav2Lip_GAN

Pros: the effect looks better and retains the original expression of the speaker.

Cons: Not very good at covering up raw lip action, especially without sound.

Recommendation:

Try Wav2Lip first, and then switch to the Wav2Lip_GAN version if you encounter the effect of a large gap in the articulation.

Enable Face Smoothing

When enabled, wav2lip will crop the face on each frame independently.

Ideal for fast movement or editing in video.

If the face is angled strangely, it could lead to convulsions.

When disabled, wav2lip will blend the detected face positions between 5 frames.

Ideal for slow movement, especially for faces that are not commonly angled.

When the face moves quickly through the frame, the mouth may be offset and look horrible between cuts.

Padding (Filling)::

This option controls the number of pixels added or removed from the face crop in each direction.

This option can help remove hard lines from the chin or other edges of the face, but too much or too little fill can change the size or position of the mouth. It is common practice to add 10 pixels to the bottom and it is recommended to experiment with different values to find the best result.

Mask Mask section

Mask Size

will increase the size of the area covered by the mask.(A border around the face reduces this value, e.g. 1.5)

Mask Feathering

Determines the amount of blending between the center and edges of the mask.(A border around the face can also increase this value)

Enable Mask Mouth Tracking

Will update the position of the mask to the position of the mouth on each frame (slower)

Attention:

Since the frames are cropped to the face, the mouth position is already approximated, and this feature is only enabled when it is noticed that the video's mask doesn't seem to follow the mouth.

Enable Mask Debugging

Turning it on will make the background grayscale and the mask become colored, and you can see the position of the mask in the frame. (After this parameter is changed to True, you can see the effect of the parameter more intuitively)

 

Integration pack acquisition

 

Chief AI Sharing CircleThis content has been hidden by the author, please enter the verification code to view the content
Captcha:
Please pay attention to this site WeChat public number, reply "CAPTCHA, a type of challenge-response test (computing)", get the verification code. Search in WeChat for "Chief AI Sharing Circle"or"Looks-AI"or WeChat scanning the right side of the QR code can be concerned about this site WeChat public number.

 

push to the end

Speaking of digital people alone, in fact, there are many ways to realize, such as heygen, Wav2lip, Geneface++, etc., these tools out of the effect are different, each has its own advantages and disadvantages.

I'll provide one more production idea for your consideration: use the FaceFusion First perform a face swap on the video and then use GPT SoVITS Speech synthesis is performed and then finally digital demographic production is performed with this project.

May not be reproduced without permission:Chief AI Sharing Circle " Wav2Lip-based visual manipulation version of the Digital Man Integration Pack

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish