Langfuse: an open source LLM application observation and debugging platform

🚀 Invitation to Experience: China's First AI IDE Intelligent Programming Software Trae Chinese version downloadThe DeepSeek-R1 and Doubao-pro are available for unlimited use!

General Introduction

Langfuse is an open source LLM (Large Language Model) engineering platform. It helps developers trace, debug, and optimize LLM applications by providing tools for observing calls, managing prompts, running experiments, and evaluating results. Developed by the Langfuse team, the platform supports frameworks such as LangChain, OpenAI, etc. It is under the MIT license and has an active community. It can be quickly self-hosted locally or in the cloud, and is ideal for teams working together to develop reliable AI applications. langfuse offers cloud services (with free packages) and self-hosted options, and is easy to deploy and proven in production environments.

For Agents and RAG The runtime is visualized and observed, similar to LangSmith.

Langfuse: Open Source LLM Application Observation and Debugging Platform-1

Function List

Applied observation: Trace each invocation of the LLM application, recording inputs and outputs, latency and cost.
Cue Management:: Centralized storage of cue words to support version control and teamwork adjustments.
Data set management: Create test datasets and run experiments to compare models or cue effects.
Assessment tools:: Support for user feedback, manual annotation and automated evaluation to check the quality of the output.
Debugging Support: View detailed logs and user sessions to quickly pinpoint problems.
Experiment playground: Test prompt words and model configurations to accelerate development iterations.
Multi-framework support: Compatible with LangChain, OpenAI SDK, LiteLLM and more.
API Integration: Provides a comprehensive API to customize LLMOps workflows.

Using Help

Installation and Deployment

cloud service

Register for an account:: Access Langfuse CloudClick on "Sign Up" to register.
Create a project: After logging in, click "New Project" and enter the project name.
Get the key:: Generated in the project settings PUBLIC_KEY cap (a poem) SECRET_KEYThe
start using: No installation required, connect to cloud services directly through the SDK.

Local Deployment (Docker Compose)

Preparing the environment: Ensure that Docker and Docker Compose are installed, which can be downloaded from the Docker website.
Cloning Code: Run in a terminal git clone https://github.com/langfuse/langfuse.gitThen go to the catalog cd langfuseThe
Starting services: Input docker compose upand wait for the startup to complete, the default address is http://localhost:3000The
validate (a theory): Browser Access http://localhost:3000If you see the login page, you have succeeded.
Configuration Keys: Generate key in UI for SDK after registration.

Kubernetes Deployment (Production Recommendations)

Preparing the Cluster: Create a Kubernetes cluster using Minikube (for local testing) or a cloud service such as AWS.
Add Helm: Running helm repo add langfuse https://langfuse.github.io/langfuse-k8s cap (a poem) helm repo updateThe
configure: Create values.yamlThe database and key information is filled in (refer to the official document).
deployments: Input helm install langfuse langfuse/langfuse -f values.yamlWait for it to finish.
interviews: Configure access to the service address based on Ingress.

Virtual Machine Deployment

Running on a single virtual machine docker compose upThe steps are the same as for local deployment.

Main Functions

Applied observation

Installing the SDK: Python Project Run pip install langfuseJS/TS Project Run npm install langfuseThe

initialization: Configure keys and hosts in code:

from langfuse import Langfuse
langfuse = Langfuse(public_key="pk-lf-xxx", secret_key="sk-lf-xxx", host="http://localhost:3000")

record call: Use decorators or manual tracing:

from langfuse.decorators import observe
@observe()
def chat(input):
return openai.chat.completions.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": input}])
chat("你好")

ferret out: Check the call details on the "Traces" page of the UI.

Cue Management

New Tip: On the "Prompts" page of the UI, click "New Prompt" and enter a name and content, for example:
```
System: 你是一个助手，直接回答问题。
User: {{question}}
```
Tips for use: Calls in code langfuse.get_prompt("prompt-name")The
version management: Automatically save the version after modifying the prompt, which can be rolled back.

Data sets and experiments

Creating Data Sets: In the "Datasets" page of UI, click "Create Dataset", name it as "qa-test".
Add Data: Enter or upload a CSV, for example:
```
Input: "1+1等于几？" Expected: "2"
```

running experiment:: Test in code:

dataset = langfuse.get_dataset("qa-test")
for item in dataset.items:
result = chat(item.input)
item.link(langfuse.trace({"output": result}), "test-1")

analyze: View the experiment results in the UI.

Playground

go into: Click "Playground" in the UI and enter the prompt and model parameters.
test (machinery etc): Click Run to view the output, adjust the parameters and save.
jump: Directly from the error results of "Traces" Playground Modification.

Featured Function Operation

Debug Log

On the "Traces" page, click on a call to see the inputs, outputs, and context.
View user sessions in "Sessions" to analyze multiple rounds of conversations.

Evaluation Output

manually operated: Rate the output (0-1) on the "Scores" page.

automation: Add an assessment via the API:

langfuse.score(trace_id="xxx", name="accuracy", value=0.95)

API Usage

Called using the OpenAPI specification or an SDK (e.g. Python/JS), e.g. to create a trace:

curl -X POST "http://localhost:3000/api/traces" -H "Authorization: Bearer sk-lf-xxx" -d '{"id": "trace-1", "name": "test"}'

application scenario

RAG Process Visualization Tracking
- Visual tracking of the overall process from keyword recall, vector recall, recall fusion, rearrangement, answer
Developing Intelligent Customer Service
- The team uses Langfuse to track conversations, optimize the quality of answers, and improve the customer experience.
Model Performance Comparison
- Developers create datasets to test the performance of multiple LLMs on a quizzing task.
On-premise deployment
- The company self-hosts Langfuse to protect sensitive data and debug internal AI applications.

QA

What languages and frameworks are supported?
- Supports Python and JS/TS, and is compatible with LangChain, OpenAI, LlamaIndex and others.
What is the minimum configuration for self-hosting?
- Smaller projects use a 2-core CPU and 4GB of RAM, larger ones recommend 8 cores and 16GB.
How do I disable telemetry?
- Setting the environment variables in the TELEMETRY_ENABLED=falseThe