General Introduction
Linly-Talker is an innovative digital human dialog system that combines Large Language Models (LLMs) with visual models to create a novel approach to human-computer interaction. The system integrates a variety of technologies such as Whisper, Linly, Microsoft Speech Services and SadTalker generating system designed to provide a realistic digital human conversation experience.Linly-Talker supports users to upload images for conversations and enhances interactivity and realism through a multi-round dialog system. The project was developed by Kedreamix and is open-sourced on GitHub for developers and researchers to use and improve.
Function List
- multi-wheel dialog system (MDS): Supports contextualized multi-round conversations for enhanced interactivity and realism.
- Image Upload Dialog: Users can upload images and talk to digital people.
- Speech synthesis and recognition: Integrates with Microsoft TTS and FunASR to provide multiple speech types and fast speech recognition.
- Video Subtitle Generation: Supports video subtitle generation for enhanced visual effects.
- voice cloning: With the GPT-SoVITS model, voices can be cloned using one minute of speech data.
- Personalized Character Generation: Supports personalized role generation with multiple models and options.
- real time chat: Integration with MuseTalk for basic real-time conversation functionality.
Using Help
Installation process
- cloning project: Run the following command in the terminal to clone the project:
git clone https://github.com/Kedreamix/Linly-Talker.git
- Installation of dependencies: Go to the project directory and install the required dependencies:
cd Linly-Talker
pip install -r requirements_app.txt
pip install -r requirements_webui.txt
- Configuration environment: Configure environment variables and certificates as needed to ensure proper system operation.
Guidelines for use
- Starting the WebUI: Run the following command to start the WebUI:
python webui.py
Open your browser to access http://localhost:7860
The Linly-Talker web interface can be accessed by clicking on the following link.
- Uploading images for conversation::
- In the WebUI interface, click the "Upload Image" button and select the image file to be uploaded.
- Once the image is uploaded, the system automatically generates a dialog that allows the user to interact with the digital person.
- Speech synthesis and recognition::
- Input text in the dialog box, select the voice type, click "Generate Voice" button, the system will synthesize the voice and play it.
- Users can also enter their voice through the microphone and the system will automatically recognize and generate text.
- Video Subtitle Generation::
- Upload video files, the system will automatically generate subtitles and embed them in the video, and users can download the video files with subtitles.
- voice cloning::
- Upload a voice sample of the target person and the system will use the GPT-SoVITS model for voice cloning to generate a voice similar to the target person.
- Personalized Character Generation::
- In the WebUI interface, select the "Personalized Character Generation" option, enter the character information, and the system will generate a personalized digital persona.
- real time chat::
- By selecting the MuseTalk module, the system will turn on the real-time dialog feature, which allows the user to interact with the digital person in real time.