General Introduction
Outlines is an open source library developed by dottxt-ai to enhance the application of Large Language Models (LLM) through structured text generation. The library supports multiple model integrations including OpenAI, transformers, llama.cpp, etc. It provides simple yet powerful prompt primitives and is based on the Jinja template engine.Outlines allows users to perform fast generation via regular expressions, JSON patterns, or Pydantic models, and supports a variety of sampling algorithms such as greedy algorithms, polynomial sampling and bundle search. The project also provides cache generation, batch inference, and other features designed to improve the speed and performance of model inference.Outlines has been used in function calls by major inference frameworks (e.g., VLLM, TGI).
Function List
- Multi-model integration: supports OpenAI, transformers, llama.cpp and other models.
- Simple and powerful hints: based on the Jinja template engine.
- Regular Expression Structure Generation: Quickly Generate Regular Expression Compliant Text
- JSON Generation: Generate text based on JSON schema or Pydantic models
- Syntax structure generation: support for loops, conditionals, and custom Python function generation
- Generate Cache: Cache generation results to improve efficiency
- Batch reasoning: supports batch processing to improve reasoning speed
- Multiple Sampling Algorithms: Supports greedy, polynomial and beam search sampling algorithms.
- Docker Support: Provide official Docker images for easy deployment.
Using Help
Installation process
- Ensure that the Python environment is installed.
- Install Outlines using pip:
pip install outlines
- If you need to use the core functionality of the Rust version, you can install outlines-core:
pip install outlines-core
Guidelines for use
Basic use
- Import the Outlines library:
import outlines
- Select and load the model:
model = outlines.models.transformers("openai/gpt-3.5-turbo")
- Create prompts and generate text:
prompt = "Generate a short introduction to AI technology."
generated_text = model.generate(prompt)
print(generated_text)
Advanced Features
- Generate structured text using regular expressions:
import outlines
generator = outlines.generate.regex("^[A-Z][a-z]+$")
result = generator("Generate a word that matches the regular expression")
print(result)
- Generate text using JSON schema:
import outlines
from pydantic import BaseModel
class Person(BaseModel): name: str
name: str
age: int
generator = outlines.generate.json(Person)
result = generator({"name": "Alice", "age": 30})
result = generator({"name": "Alice", "age": 30})
- Use multiple sampling algorithms:
import outlines
generator = outlines.generate.choice(model, ["option1", "option2", "option3"])
result = generator("Please select an option:")
print(result)
- Deploy a Docker image:
docker pull outlinesdev/outlines
docker run -p 8000:8000 outlinesdev/outlines
common problems
- How can I increase the speed of generation? The speed of generation can be significantly improved by using batch inference and generating cache functions.
- How to integrate custom functions? Custom Python functions can be used to handle complex logic during generation.
Problems posed by unstructured output of large models
concern
Large Language Models (LLMs) have powerful text generation capabilities, but do not perform reliably enough in generating structured data. This poses a serious problem for Agent-centered AI applications.
Core issues
- output inconsistency: When extracting flight information from emails, the ideal would be to output consistent JSON objects, but LLM often fails, leading to issues such as "JSON decode errors".
- Lack of reliability: This unpredictability makes it difficult to build complex modular systems based on LLM.
affect (usually adversely)
Without reliable structured output, developers need to extract information through cumbersome post-processing (e.g. regular expressions), leading to inefficient and error-prone development.
Benefits of Structured Output
Generalized structural nature of data
Even seemingly unstructured data (e.g., GSM datasets) often have an inherent structure that can be exploited.
Guaranteed output format
By defining specific structures (such as JSON or regular expressions), you can ensure the validity of the output and avoid tedious post-processing.
Improve performance and efficiency
- Improving JSON Efficiency: JSON efficiency increased from 17.7% to 99.9% using structured generation.
- Reducing the need for examples: In the GSM8K benchmarking, the performance of the one-time structured generation is almost comparable to the eight-time unstructured generation.
- Improving Open Model Performance: In the function call benchmark, performance improves from 86% to 96.5%, even outperforming GPT-4.
Structured vs. unstructured outputs
To better understand the advantages of structured output, we can compare the differences between structured and unstructured output by using the following example.
Suppose we need to extract flight information from an email:
unstructured output
When the output generated by a large model is not strictly formatted, the following text may be obtained:
The flight to Paris is next Tuesday, probably at 10 a.m. The plane is Air France.
This output, although it contains the information we need (destination, date, time, airline, etc.), it does not have a clear structure. To extract this information from it, the developer needs to parse each field using regular expressions or other text processing methods, which is both tedious and error-prone. For example, it is possible for the model to give differently formatted output for different inputs, leading to system processing errors or "JSON decode errors".
Structured Output
If structured generation is used, the model will return data that conforms to a predefined format, for example:
{
"destination": "Paris".
"departure_date": "2024-11-25",
"time": "10:00",
"airline": "Air France"
}
In this case, the output is uniform and standardized. The developer no longer needs to additionally process or parse the information because all key fields are already returned in the expected format. This not only saves development time, but also greatly reduces the probability of errors.
Through this comparison, we can clearly see that structured output not only ensures the consistency and reliability of the data, but also significantly improves processing efficiency, especially when a large amount of information needs to be extracted and processed from a large model.
Does using Outlines cause performance problems for large models?
How does Outlines work?
rationale
- Logit processing: After the model generates logits, Outlines checks each possible next token and blocks those that violate the defined structure.
- Efficiency Optimization: Extremely low additional overhead through efficient masking.
typical example
"If the generated token would destroy the structure, it is immediately blocked to ensure that the generation process strictly follows the predefined structure."
Does structured output slow down output?
It won't. On the contrary, structured generation usually accelerates the generation process:
- Reducing useless tokens: Avoid generating redundant field names or parentheses by defining the structure in advance.
- Reduced generation length: Structured outputs typically have fewer tokens, are faster, and are clearer.
Comparison of Outlines with Other Structured Generation Libraries
- Compare with Guidance: Outlines has almost zero overhead in the inference phase, while Guidance can slow down significantly when generating large numbers of tokens.
- Comparison with LMQL: Outlines' core strengths are its lightweight design and efficiency.
code example
The following is an example of using Outlines to generate structured event data:
from datetime import datetime
from pydantic import BaseModel, Field
from outlines import generate, models
# Load model
model = models.mlxlm("mlx-community/Hermes-3-Llama-3.1-8B-8bit")
# defines the event structure using Pydantic
class Event(BaseModel).
title: str = Field(description="title of the event")
location: str
start: datetime = Field(
default=None, description="date of the event if available in iso format")
)
# Get the current time
now = datetime.now().strftime("%A %d %B %Y and it's %H:%M")
# Define prompt
prompt = f"""
Today's date and time are {now}
Given a user message, extract information of the event like date and time in iso format, location and title.
If the given date is relative, think step by step to find the right date.
Here is the message.
"""
# Example Message
message = """Hello Kitty, my grandmother will be here , I think it's better to postpone our appointment to review math lessons to next Friday at 2pm at the same place, 3 avenue.
appointment to review math lessons to next Friday at 2pm at the same place, 3 avenue des tanneurs, I think that one hour will be enough to
see you 😘😘😘😘😘 """
# Create the generator
generator = generate.json(model, Event)
# Extract the event message
event = generator(prompt + message)
# Output the result
print(f "Today: {now}")
print(event.json())
The generated event information is as follows:
{
"title": "Math Review".
"location": "3 avenue des tanneurs".
"start": "2024-11-22T14:00:00Z"
}
Conclusion and outlook
Structured generation is no longer just a niche feature, but the future of large modeling applications:
- Higher reliability and efficiency: The performance of LLM is significantly improved by structured generation.
- The Potential of Open Source: The success of Outlines demonstrates the potential of open source models to compete with proprietary models.
In the future, Outlines is expected to become a key component in the developer's toolbox as structured generation becomes more popular.