AI Personal Learning
and practical guidance
豆包Marscode1

AI Engineering Academy: 2.6 RAG Observability - Arize Phoenix Setup

Welcome to this notebook, where we will explore how to use Llama Index to set up and observe retrieval enhancement generation (RAG) Streamline.

https://github.com/adithya-s-k/AI-Engineering.academy/tree/main/RAG/01_RAG_Observability


 

summary

This guide provides a complete tutorial on configuring the required tools and libraries, including embedded models and vector store indexes, for efficient document retrieval and query processing. We will cover everything from installation and setup to querying and retrieving relevant information to help you master the advanced search capabilities of the RAG pipeline.

 

introduction (a subject)

To get started with this notebook, you'll need a basic understanding of Python and some familiarity with machine learning concepts. If you're not familiar with these concepts, don't worry - we'll walk you through them step-by-step!

pre-conditions

  • Python 3.7+
  • Jupyter Notebook or JupyterLab
  • Basics of Python and Machine Learning Concepts

 

1. Setting

1.1 Installation of necessary software packages

To begin setting up Arize Phoenix, you will need to install the necessary software packages.

Arize Phoenix is a comprehensive tool designed for observability and monitoring of machine learning and AI systems. It provides the ability to track and analyze all aspects of machine learning models and data pipelines.

!pip install arize-phoenix
!pip install openinference-instrumentation-openai

These commands will install:

  • arize-phoenix: Tools for machine learning workflow observability.
  • openinference-instrumentation-openai: A package for integrating OpenAI models with observability tools such as Arize Phoenix.

1.2 Setting up the Arize Phoenix

There are three ways to complete the setup as follows:

Please read more Here.

  • command line (computing)
    python3 -m phoenix.server.main serve
    
  • Docker starts the docker image of phoenix using the following command:
    docker run -p 6006:6006 -p 4317:4317 arizephoenix/phoenix:latest
    

    This exposes the Phoenix UI and REST API to localhost:6006, and exposes the gRPC endpoint for spans to localhost:4317.

  • notebooks
    import phoenix as px
    px.launch_app()
    

1.3 Importing the necessary libraries and configuring the environment

Import the necessary libraries and set up the environment before proceeding with data processing and evaluation:

import json
import os
from getpass import getpass
import nest_asyncio
import pandas as pd
from tqdm import tqdm
import phoenix as px
# 允许在笔记本环境中进行并发评估
nest_asyncio.apply()
# 设置 pandas DataFrame 的显示选项以展示更多内容
pd.set_option("display.max_colwidth", 1000)
  • jsonos: Standard Python library for handling JSON data and operating system interaction.
  • getpass: A tool for entering passwords securely.
  • nest_asyncio: Allows the use of asyncio in Jupyter notebooks.
  • pandas (pd): A powerful Python data manipulation library.
  • tqdm: Provide a progress bar for the loop to track the progress of data processing.
  • phoenix (px): part of the Arize Observability Tool, which provides an interactive UI for exploring data and monitoring machine learning models.

configure nest_asyncio to allow concurrent evaluations in a notebook environment and to set the maximum column width of the pandas DataFrame to improve readability.

1.4 Starting the Phoenix Application

px.launch_app()

This function initializes and launches the Phoenix application, which opens in a new tab of the default browser and provides an interactive interface for exploring datasets, visualizing model performance, and debugging.

1.5 Viewing Phoenix Application Sessions

Once the Phoenix application is launched, you can interact with the application directly in the notebook using session objects. Run the following code to launch the Phoenix application and view it in the current session:

# 启动并查看 Phoenix 应用会话
(session := px.launch_app()).view()

This line of code starts the Phoenix application and assigns the session to a variable called session, using the view() The method displays the Phoenix application directly in the notebook interface, providing an integrated experience without having to switch between browser and notebook.

1.6 Setting the endpoint of the trace

To send trace data to a Phoenix application for analysis and observability, define the endpoint URL where the Phoenix application listens for incoming data.

endpoint = "http://127.0.0.1:6006/v1/traces"

endpoint Variable storing the endpoint URL used by the Phoenix application to listen for incoming trace data.

 

2. Tracking OpenAI

More integration. Reading.

2.1 Installing and Importing OpenAI Packages

!pip install openai
import openai

openai: A Python client library for the OpenAI API. It allows you to send requests to OpenAI's models (including GPT-3 and GPT-4) to accomplish a variety of tasks.

2.2 Configuring the OpenAI API Key

import openai
import os
from getpass import getpass
# 从环境变量中获取 API 密钥,若未设置则提示用户输入
if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
openai_api_key = getpass("🔑 输入您的 OpenAI API 密钥:")
# 为 OpenAI 客户端设置 API 密钥
openai.api_key = openai_api_key
# 将 API 密钥存储到环境变量中以供后续使用
os.environ["OPENAI_API_KEY"] = openai_api_key
  • Getting the API key: the code first tries to get the API key from the environment variable (OPENAI_API_KEY). If the key is not found, the user is prompted for secure input via getpass.
  • Set API key: The retrieved or supplied API key is then set as the key for the OpenAI client library.
  • Storing the API key: Finally, store the API key in an environment variable to ensure that it is readily available during the session.

2.3 Setting up OpenTelemetry for Tracking

To enable tracing for your OpenAI interactions, configure OpenTelemetry and set up the required components.

from opentelemetry import trace as trace_api
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
# 设置 Tracer 提供程序
tracer_provider = trace_sdk.TracerProvider()
# 定义带有端点的 OTLP Span 导出器
span_exporter = OTLPSpanExporter(endpoint)
# 设置 Span Processor 以处理和导出 spans
span_processor = SimpleSpanProcessor(span_exporter)
# 将 Span Processor 添加到 Tracer 提供程序
tracer_provider.add_span_processor(span_processor)
# 设置全局 Tracer 提供程序
trace_api.set_tracer_provider(tracer_provider)

OpenTelemetry Library

In the provided code, several OpenTelemetry libraries are used to set up tracing. Below is an overview of each library:

  • opentelemetry:corresponds English -ity, -ism, -ization: OpenTelemetry's core library, providing APIs for tracking and metrics.usage: Includes the trace module for creating and managing traces.
  • opentelemetry.exporter.otlp.proto.http.trace_exporter:corresponds English -ity, -ism, -ization: Provides a trace exporter using OTLP (OpenTelemetry Protocol) over HTTP.usage: The module's OTLPSpanExporter class is used to send tracking data to an OTLP-compatible backend and is configured by setting up endpoints.
  • opentelemetry.sdk.trace:corresponds English -ity, -ism, -ization: Contains SDK implementations for tracking, including TracerProviderTheusage::
    • TracerProvider: Manages Tracer instances and is responsible for exporting spans (units of work) collected in a trace.
    • SimpleSpanProcessor: A processor that synchronizes export spans to process and send data to the exporter.
  • opentelemetry.sdk.trace.export:corresponds English -ity, -ism, -ization: Provides classes for exporting tracking data.usage::
    • SimpleSpanProcessor: Processing spans and exporting them using the specified exporter ensures that the data is sent to the backend for analysis.

2.4 Instrumenting OpenAI with OpenInference

In order to integrate OpenTelemetry into OpenAI and enable tracking for OpenAI model interactions, use the tracking from the openinference libraries OpenAIInstrumentorThe

from openinference.instrumentation.openai import OpenAIInstrumentor
# 实例化并为 OpenAI 应用 instrumentation
OpenAIInstrumentor().instrument()
  • OpenAIInstrumentor: From openinference library's classes for instrumenting OpenAI's API calls to enable tracking and observability.
  • instrument(): This method configures the OpenAI API client to automatically generate and send trace data to the OpenTelemetry backend. It integrates the configured trace settings and enables you to monitor and analyze interactions with OpenAI models.

By running this code, you can ensure that all OpenAI API calls are tracked to capture detailed insights about model usage and performance.

2.5 Making a Request to the OpenAI API

To interact with OpenAI's API and get a response, use the following code. This example shows how to create a chat-completion request and print the results via the OpenAI API:

import openai
# 创建 OpenAI 客户端实例
client = openai.OpenAI()
# 向 OpenAI API 发起聊天补全请求
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a haiku."}],
)
# 打印响应内容
print(response.choices[0].message.content)
  • openai.OpenAI(): Initialize an OpenAI client instance that can be used to interact with the OpenAI API.
  • client.chat.completions.create(): Send a request to the OpenAI API to create a chat complement.
    • model="gpt-4o": Specify the name of the model used to generate the complement. Make sure the model name is correct and that you have enabled it in your OpenAI API account.
    • messages: A list of messages that includes the dialog history. This example contains a single message request sent by the user: "Write a haiku."
  • response.choices[0].message.content: Extracts and prints model-generated complements.

 

3. Tracking Llama index

3.1 Installing and Importing Required Libraries

!pip install llama-index
!pip install llama-index-core
!pip install llama-index-llms-openai
!pip install openinference-instrumentation-llama-index==2.2.4
!pip install -U llama-index-callbacks-arize-phoenix
!pip install "arize-phoenix[llama-index]"
  • llama-index:: Core package for Llama Index functionality.
  • llama-index-core: Provides the core features and tools of Llama Index.
  • llama-index-llms-openai: A package for integrating Llama Index with OpenAI models.
  • openinference-instrumentation-llama-index==2.2.4: Provides tools for instrumenting Llama Index interactions.
  • llama-index-callbacks-arize-phoenix: Provides integration with Arize Phoenix callbacks.
  • arize-phoenix[llama-index]: Extend Arize Phoenix to support Llama Index tracking.

3.2 Getting the URL of the currently active Phoenix session

# 获取当前活动的 Phoenix 会话的 URL
px.active_session().url

Access the currently active Phoenix session and retrieve its URL to view or share the Phoenix interface for monitoring and analyzing tracking data.

3.3 Setting up Llama Index tracking

To set up OpenTelemetry tracking for Llama Index, configure the Tracer provider and integrate Llama Index Instrumentor.

from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
# 设置 Tracer 提供程序
tracer_provider = trace_sdk.TracerProvider()
# 添加 Span Processor 到 Tracer 提供程序
tracer_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter(endpoint)))
# 使用 Tracer 提供程序 对 Llama Index 进行 Instrumentation
LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)
  • LlamaIndexInstrumentor: From openinference.instrumentation.llama_index class for tracking and observability instrumentation of the Llama Index.
  • trace_sdk.TracerProvider(): Initialize a new Tracer provider program for creating and managing trace data.
  • SimpleSpanProcessor: Used to synchronize the export of spans and send the data to the backend.
  • LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider): Apply instrumentation to the Llama Index and trace it using the provided Tracer provider.

3.4 Using OpenAI to Interact with the Llama Index

To perform a completion request through Llama Index using an OpenAI model, use the following code:

from llama_index.llms.openai import OpenAI
# 初始化 OpenAI 模型
llm = OpenAI(model="gpt-4o-mini")
# 发起完成请求
resp = llm.complete("Paul Graham is ")
# 打印响应结果
print(resp)
  • from llama_index.llms.openai import OpenAI: from llama_index package to import OpenAI classes for interaction with OpenAI models.
  • OpenAI(model="gpt-4o-mini"): Initialize the OpenAI class instance with the specified model (e.g., gpt-4).
  • llm.complete(...): Send prompt text to the model to generate response content.

3.5 Chat Interaction with Llama Index Using OpenAI

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
# 初始化 OpenAI 模型
llm = OpenAI()
# 定义聊天消息
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
# 获取模型的响应结果
resp = llm.chat(messages)
  • OpenAI: Classes for interacting with OpenAI models.
  • ChatMessage: Class for formatting chat messages.
  • OpenAI(): Initialize an instance of the OpenAI class.
  • ChatMessage: Creates a chat message object, specifying the role (e.g., "system", "user") and the content of the message.
    • role="system": Define system messages to set the context or personality of the model.
    • role="user": Represents a message sent by the user.
  • llm.chat(messages): Sends the defined message to the model and receives the response result.

This code interacts with the OpenAI model through system and user message settings for chat.

 

4. Observation of the RAG process

4.1 Setting up the environment for observing the RAG process

!pip install llama-index
!pip install llama-index-vector-stores-qdrant
!pip install llama-index-readers-file
!pip install llama-index-embeddings-fastembed
!pip install llama-index-llms-openai
!pip install -U qdrant_client fastembed
  • llama-index: Core package for Llama Index functionality.
  • llama-index-vector-stores-qdrant: Integrate Qdrant as a vector store for Llama Index.
  • llama-index-readers-file: Provides file reading capabilities for Llama Index.
  • llama-index-embeddings-fastembed: FastEmbed support for Llama Index to generate vector embeddings.
  • llama-index-llms-openai: Integrate OpenAI models into Llama Index.
  • qdrant_client: Client-side library that interacts with Qdrant, a vector search engine.
  • fastembed: Library for fast generation of embedding vectors.

4.2 Preparing a RAG Process with Embedding and Document Indexing

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.fastembed import FastEmbedEmbedding
# from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.settings import Settings
Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5")
# Settings.embed_model = OpenAIEmbedding(embed_batch_size=10)
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
  • VectorStoreIndex: Classes for creating and managing vector store indexes. The index performs efficient similarity search and retrieval based on document vectors.
  • SimpleDirectoryReader: Class that reads documents from a specified directory and preprocesses the loaded files for indexing.
  • FastEmbedEmbedding: An embedding model class that generates text embedding vectors using the FastEmbed library. Specify the model name as "BAAI/bge-base-en-v1.5"The
  • from llama_index.embeddings.openai import OpenAIEmbedding::
    • OpenAIEmbedding: The embedding model class used to generate vectors via the OpenAI embedding service. It is commented out by default. If you wish to use an OpenAI model instead of FastEmbed, you can uncomment it and configure the parameters, such as embed_batch_sizeThe
  • Settings: Class for setting the global embedding model configuration. The global embedding model configuration can be set by giving the embed_model attribute assignment that specifies the embedding model to use.
  • Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5"): Setting up global use FastEmbedEmbedding as an embedding vector model.
  • documents = SimpleDirectoryReader("data").load_data(): From the catalog "data" Load and preprocess document data. Ensure that directory names are adjusted to the actual path of the project.
  • index = VectorStoreIndex.from_documents(documents): Create a vector-stored index based on preprocessed documents. This step implements a vectorized representation of the document and allows vector-based queries.

4.3 Query Vector Storage Index

Once the vector store index is set up, you can use it to perform queries and retrieve relevant information.

query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)
  • as_query_engine():: Will VectorStoreIndex Convert to a query engine. This engine allows you to perform searches and retrieve information based on a vector representation of documents stored in an index.
  • query(): Execute a query on a vector store index. The query string "What did the author do growing up?" is used to search for relevant documents and retrieve information based on the context provided by the vector embedding.

Finally.response Contains information retrieved from vector storage indexes that are generated based on queries and indexed documents.

 

reach a verdict

In this guide, we set up a Retrieval Augmented Generation (RAG) process using Llama Index and integrated it with various components to observe its functionality. We first configured and installed the required libraries, including Llama Index, OpenTelemetry, and various embedding models.

We did the following next:

  • Initialize and configure the embedding model, using FastEmbed or OpenAI models if necessary.
  • Loads and indexes documents from the catalog to prepare data for querying.
  • Set up the query engine to perform searches and retrieve relevant information based on indexed documents.

By following these steps, you have successfully prepared a RAG process that enables efficient document retrieval and query processing. The setup utilizes vector-based embedding and indexing to provide advanced search and information retrieval capabilities.

Feel free to experiment with different configurations and queries to further explore the capabilities of the RAG process. If you have any questions or need further customization, please consult the documentation of the libraries you are using or for more guidance.

May not be reproduced without permission:Chief AI Sharing Circle " AI Engineering Academy: 2.6 RAG Observability - Arize Phoenix Setup
en_USEnglish