AI Personal Learning
and practical guidance
Resource Recommendation 1

TPO-LLM-WebUI: An AI framework where you can input questions to train a model to output results in real time

General Introduction

TPO-LLM-WebUI is an innovative project open-sourced by Airmomo on GitHub, which enables real-time optimization of Large Language Models (LLMs) through an intuitive web interface. It adopts the TPO (Test-Time Prompt Optimization) framework, completely bidding farewell to the tedious process of traditional fine-tuning, and directly optimizing the model output without training. After the user inputs a question, the system uses rewarding models and iterative feedback to allow the model to dynamically evolve during the reasoning process, making it smarter and smarter, and improving the quality of the output by up to 50%. Whether it's for technical document touch-ups or security response generation, this lightweight and efficient tool provides powerful support for developers and researchers.

TPO-LLM-WebUI: An AI framework for optimizing model outputs in real time without fine-tuning and inputting problems-1


 

TPO-LLM-WebUI: An AI framework for optimizing model outputs in real time without fine-tuning and inputting problems-1

 

Function List

  • Real-time evolution: Optimize the output through the inference phase, the more you use it, the more it meets the user's needs.
  • No fine-tuning required: Not updating model weights and directly improving generation quality.
  • Multi-model compatible: Support for loading different base and reward models.
  • Dynamic preference alignment: Adjusting output based on reward feedback to approximate human expectations.
  • Reasoning Visualization: Demonstrate the optimization iteration process for easy understanding and debugging.
  • Lightweight and efficient: Computing is low cost and simple to deploy.
  • Open Source Flexibility: Provides source code and supports user-defined development.

 

Using Help

Installation process

The deployment of TPO-LLM-WebUI requires some basic environment configuration. Below are the detailed steps to help users get started quickly.

1. Preparing the environment

Make sure the following tools are installed:

  • Python 3.10: Core Operating Environment.
  • Git: Used to get the project code.
  • GPU (recommended): NVIDIA GPUs accelerate inference.

Create a virtual environment:
Use Condi:

conda create -n tpo python=3.10
conda activate tpo

or Python's own tools:

python -m venv tpo
source tpo/bin/activate # Linux/Mac
tpo\Scripts\activate # Windows

Download and install the dependencies:

git clone https://github.com/Airmomo/tpo-llm-webui.git
cd tpo-llm-webui
pip install -r requirements.txt

Install TextGrad:
TPO relies on TextGrad, which requires additional installation:

cd textgrad-main
pip install -e .
cd .

2. Configuration model

You need to manually download the base model and the bonus model:

  • basic modelAs deepseek-ai/DeepSeek-R1-Distill-Qwen-32B(Hugging Face)
  • reward modelAs sfairXC/FsfairX-LLaMA3-RM-v0.1(Hugging Face)
    Place the model in the specified directory (e.g. /model/HuggingFace/), and in config.yaml Set the path in the

3. Start the vLLM service

utilization vLLM Hosting base model. Take 2 GPUs as an example:

vllm serve /model/HuggingFace/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
--dtype auto
--api-key token-abc123
---tensor-parallel-size 2
---max-model-len 59968
--port 8000

After the service is running, listen to the http://127.0.0.1:8000The

4. Running WebUI

Launch the web interface in a new terminal:

python gradio_app.py

browser access http://127.0.0.1:7860The following is an example of how to use it.

Main function operation flow

Function 1: Model initialization

 

  1. Open Model Settings
    Go to WebUI and click "Model Settings".
  2. Connecting to vLLM
    Enter the address (e.g. http://127.0.0.1:8000) and the key (token-abc123).
  3. Loading Reward Models
    Specify the path (e.g. /model/HuggingFace/sfairXC/FsfairX-LLaMA3-RM-v0.1Click "Initialize" and wait for 1-2 minutes.
  4. Confirmation of readiness
    You can continue after the interface prompts "Model ready".

Function 2: Real-time optimized output

TPO-LLM-WebUI: An AI framework for optimizing model outputs in real time without fine-tuning and inputting problems-1

  1. Toggle Optimization Page
    Go to "Optimize Settings".
  2. Input Issues
    Enter content such as "Touch up this technical document".
  3. runtime optimization
    Click "Start Optimization" and the system generates multiple candidate results and iteratively improves them.
  4. Check out the evolutionary process
    The results page displays the initial and optimized outputs with progressively higher quality.

Function 3: Script mode optimization

If you are not using the WebUI, you can run a script:

python run.py
--data_path data/sample.json
--ip 0.0.0.1
--port 8000
---server_model /model/HuggingFace/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
--reward_model /model/HuggingFace/sfairXC/FsfairX-LLaMA3-RM-v0.1
---tpo_mode tpo
--max_iterations 2
--sample_size 5

Optimization results are saved to logs/ Folder.

Featured Functions Detailed Description

Say goodbye to fine-tuning and evolve in real time

  • procedure::
    1. Enter the question and the system generates the initial response.
    2. Reward model evaluation and feedback to guide the next iteration.
    3. After several iterations, the output becomes "smarter" and the quality improves significantly.
  • dominance: Save time and arithmetic by optimizing at any time without training.

The more you use it, the smarter you get.

  • procedure::
    1. Use the same model multiple times with different inputs for different problems.
    2. The system accumulates experience based on each feedback and the output is better tailored to the needs.
  • dominance: Dynamically learns user preferences for better results in the long run.

caveat

  • hardware requirementRecommended 16GB or more of video memory, multiple GPUs need to ensure that resources are free and available. export CUDA_VISIBLE_DEVICES=2,3 Designation.
  • Problem solving: When the video memory overflows, lower the sample_size or check GPU occupancy.
  • Community Support: See the GitHub README or Issues for help.
Tools Download
May not be reproduced without permission:Chief AI Sharing Circle " TPO-LLM-WebUI: An AI framework where you can input questions to train a model to output results in real time

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish