Outlines: Generating Structured Text Output via Regular Expressions, JSON or Pydantic Models

🚀 Invitation to Experience: China's First AI IDE Intelligent Programming Software Trae Chinese version downloadThe DeepSeek-R1 and Doubao-pro are available for unlimited use!

General Introduction

Outlines is an open source library developed by dottxt-ai to enhance the application of Large Language Models (LLM) through structured text generation. The library supports multiple model integrations including OpenAI, transformers, llama.cpp, etc. It provides simple yet powerful prompt primitives and is based on the Jinja template engine.Outlines allows users to perform fast generation via regular expressions, JSON patterns, or Pydantic models, and supports a variety of sampling algorithms such as greedy algorithms, polynomial sampling and bundle search. The project also provides cache generation, batch inference, and other features designed to improve the speed and performance of model inference.Outlines has been used in function calls by major inference frameworks (e.g., VLLM, TGI).

Outlines: Generating Structured Text Output via Regular Expressions, JSON or Pydantic Models-1

Function List

Multi-model integration: supports OpenAI, transformers, llama.cpp and other models.
Simple and powerful hints: based on the Jinja template engine.
Regular Expression Structure Generation: Quickly Generate Regular Expression Compliant Text
JSON Generation: Generate text based on JSON schema or Pydantic models
Syntax structure generation: support for loops, conditionals, and custom Python function generation
Generate Cache: Cache generation results to improve efficiency
Batch reasoning: supports batch processing to improve reasoning speed
Multiple Sampling Algorithms: Supports greedy, polynomial and beam search sampling algorithms.
Docker Support: Provide official Docker images for easy deployment.

Using Help

Installation process

Ensure that the Python environment is installed.
Install Outlines using pip:

   pip install outlines

If you need to use the core functionality of the Rust version, you can install outlines-core:

   pip install outlines-core

Guidelines for use

Basic use

Import the Outlines library:

   import outlines

Select and load the model:

   model = outlines.models.transformers("openai/gpt-3.5-turbo")

Create prompts and generate text:

   prompt = "生成一个关于AI技术的简短介绍。"
generated_text = model.generate(prompt)
print(generated_text)

Advanced Features

Generate structured text using regular expressions:

   import outlines
generator = outlines.generate.regex("^[A-Z][a-z]+$")
result = generator("生成一个符合正则表达式的单词")
print(result)

Generate text using JSON schema:

   import outlines
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
generator = outlines.generate.json(Person)
result = generator({"name": "Alice", "age": 30})
print(result)

Use multiple sampling algorithms:

   import outlines
generator = outlines.generate.choice(model, ["选项1", "选项2", "选项3"])
result = generator("请选择一个选项：")
print(result)

Deploy a Docker image:

   docker pull outlinesdev/outlines
docker run -p 8000:8000 outlinesdev/outlines

common problems

How can I increase the speed of generation? The speed of generation can be significantly improved by using batch inference and generating cache functions.
How to integrate custom functions? Custom Python functions can be used to handle complex logic during generation.

Problems posed by unstructured output of large models

concern

Large Language Models (LLMs) have powerful text generation capabilities, but do not perform reliably enough in generating structured data. This poses a serious problem for Agent-centered AI applications.

Core issues

output inconsistency: When extracting flight information from emails, the ideal would be to output consistent JSON objects, but LLM often fails, leading to issues such as "JSON decode errors".
Lack of reliability: This unpredictability makes it difficult to build complex modular systems based on LLM.

affect (usually adversely)

Without reliable structured output, developers need to extract information through cumbersome post-processing (e.g. regular expressions), leading to inefficient and error-prone development.

Benefits of Structured Output

Generalized structural nature of data

Even seemingly unstructured data (e.g., GSM datasets) often have an inherent structure that can be exploited.

Guaranteed output format

By defining specific structures (such as JSON or regular expressions), you can ensure the validity of the output and avoid tedious post-processing.

Improve performance and efficiency

Improving JSON Efficiency: JSON efficiency increased from 17.7% to 99.9% using structured generation.
Reducing the need for examples: In the GSM8K benchmarking, the performance of the one-time structured generation is almost comparable to the eight-time unstructured generation.
Improving Open Model Performance: In the function call benchmark, performance improves from 86% to 96.5%, even outperforming GPT-4.

Structured vs. unstructured outputs

To better understand the advantages of structured output, we can compare the differences between structured and unstructured output by using the following example.

Suppose we need to extract flight information from an email:

unstructured output

When the output generated by a large model is not strictly formatted, the following text may be obtained:

飞往巴黎的航班在下周二，可能是早上10点，飞机是法国航空。

This output, although it contains the information we need (destination, date, time, airline, etc.), it does not have a clear structure. To extract this information from it, the developer needs to parse each field using regular expressions or other text processing methods, which is both tedious and error-prone. For example, it is possible for the model to give differently formatted output for different inputs, leading to system processing errors or "JSON decode errors".

Structured Output

If structured generation is used, the model will return data that conforms to a predefined format, for example:

{
  "destination": "巴黎",
  "departure_date": "2024-11-25",
  "time": "10:00",
  "airline": "法国航空"
}

In this case, the output is uniform and standardized. The developer no longer needs to additionally process or parse the information because all key fields are already returned in the expected format. This not only saves development time, but also greatly reduces the probability of errors.

Through this comparison, we can clearly see that structured output not only ensures the consistency and reliability of the data, but also significantly improves processing efficiency, especially when a large amount of information needs to be extracted and processed from a large model.

Does using Outlines cause performance problems for large models?

How does Outlines work?

rationale

Logit processing: After the model generates logits, Outlines checks each possible next token and blocks those that violate the defined structure.
Efficiency Optimization: Extremely low additional overhead through efficient masking.

typical example

"If the generated token would destroy the structure, it is immediately blocked to ensure that the generation process strictly follows the predefined structure."

Does structured output slow down output?

It won't. On the contrary, structured generation usually accelerates the generation process:

Reducing useless tokens: Avoid generating redundant field names or parentheses by defining the structure in advance.
Reduced generation length: Structured outputs typically have fewer tokens, are faster, and are clearer.

Comparison of Outlines with Other Structured Generation Libraries

Compare with Guidance: Outlines has almost zero overhead in the inference phase, while Guidance can slow down significantly when generating large numbers of tokens.
Comparison with LMQL: Outlines' core strengths are its lightweight design and efficiency.

code example

The following is an example of using Outlines to generate structured event data:

from datetime import datetime
from pydantic import BaseModel, Field
from outlines import generate, models

# 加载模型
model = models.mlxlm("mlx-community/Hermes-3-Llama-3.1-8B-8bit")

# 使用 Pydantic 定义事件结构
class Event(BaseModel):
    title: str = Field(description="title of the event")
    location: str
    start: datetime = Field(
        default=None, description="date of the event if available in iso format"
    )

# 获取当前时间
now = datetime.now().strftime("%A %d %B %Y and it's %H:%M")

# 定义提示
prompt = f"""
Today's date and time are {now}
Given a user message, extract information of the event like date and time in iso format, location and title.
If the given date is relative, think step by step to find the right date.
Here is the message:
"""

# 示例消息
message = """Hello Kitty, my grandmother will be here , I think it's better to postpone our
appointment to review math lessons to next Friday at 2pm at the same place, 3 avenue des tanneurs, I think that one hour will be enough
see you 😘 """

# 创建生成器
generator = generate.json(model, Event)

# 提取事件信息
event = generator(prompt + message)

# 输出结果
print(f"Today: {now}")
print(event.json())

The generated event information is as follows:

{
  "title": "Math Review",
  "location": "3 avenue des tanneurs",
  "start": "2024-11-22T14:00:00Z"
}

Conclusion and outlook

Structured generation is no longer just a niche feature, but the future of large modeling applications:

Higher reliability and efficiency: The performance of LLM is significantly improved by structured generation.
The Potential of Open Source: The success of Outlines demonstrates the potential of open source models to compete with proprietary models.

In the future, Outlines is expected to become a key component in the developer's toolbox as structured generation becomes more popular.

Outlines: Generate structured text output via regular expressions, JSON or Pydantic models