Welcome to this notebook, where we will explore how to use Llama Index to set up and observe retrieval enhancement generation (RAG) Streamline.
https://github.com/adithya-s-k/AI-Engineering.academy/tree/main/RAG/01_RAG_Observability
summary
This guide provides a complete tutorial on configuring the required tools and libraries, including embedded models and vector store indexes, for efficient document retrieval and query processing. We will cover everything from installation and setup to querying and retrieving relevant information to help you master the advanced search capabilities of the RAG pipeline.
introduction (a subject)
To get started with this notebook, you'll need a basic understanding of Python and some familiarity with machine learning concepts. If you're not familiar with these concepts, don't worry - we'll walk you through them step-by-step!
pre-conditions
- Python 3.7+
- Jupyter Notebook or JupyterLab
- Basics of Python and Machine Learning Concepts
1. Setting
1.1 Installation of necessary software packages
To begin setting up Arize Phoenix, you will need to install the necessary software packages.
Arize Phoenix is a comprehensive tool designed for observability and monitoring of machine learning and AI systems. It provides the ability to track and analyze all aspects of machine learning models and data pipelines.
! pip install arize-phoenix
! pip install openinference-instrumentation-openai
These commands will install:
arize-phoenix
: Tools for machine learning workflow observability.openinference-instrumentation-openai
: A package for integrating OpenAI models with observability tools such as Arize Phoenix.
1.2 Setting up the Arize Phoenix
There are three ways to complete the setup as follows:
Please read more Here.
- command line (computing)
python3 -m phoenix.server.main serve
- Docker starts the docker image of phoenix using the following command:
docker run -p 6006:6006 -p 4317:4317 arizephoenix/phoenix:latest
This exposes the Phoenix UI and REST API to localhost:6006, and exposes the gRPC endpoint for spans to localhost:4317.
- notebooks
import phoenix as px px.launch_app()
1.3 Importing the necessary libraries and configuring the environment
Import the necessary libraries and set up the environment before proceeding with data processing and evaluation:
import json
import os
from getpass import getpass
import nest_asyncio
import pandas as pd
from tqdm import tqdm
import phoenix as px
# allows concurrent evaluation in a laptop environment
nest_asyncio.apply()
# Set display options for pandas DataFrame to show more content
pd.set_option("display.max_colwidth", 1000)
json
,os
: Standard Python library for handling JSON data and operating system interaction.getpass
: A tool for entering passwords securely.nest_asyncio
: Allows the use of asyncio in Jupyter notebooks.pandas
(pd
): A powerful Python data manipulation library.tqdm
: Provide a progress bar for the loop to track the progress of data processing.phoenix
(px
): part of the Arize Observability Tool, which provides an interactive UI for exploring data and monitoring machine learning models.
configure nest_asyncio
to allow concurrent evaluations in a notebook environment and to set the maximum column width of the pandas DataFrame to improve readability.
1.4 Starting the Phoenix Application
px.launch_app()
This function initializes and launches the Phoenix application, which opens in a new tab of the default browser and provides an interactive interface for exploring datasets, visualizing model performance, and debugging.
1.5 Viewing Phoenix Application Sessions
Once the Phoenix application is launched, you can interact with the application directly in the notebook using session objects. Run the following code to launch the Phoenix application and view it in the current session:
# Launch and view Phoenix application session
(session := px.launch_app()).view()
This line of code starts the Phoenix application and assigns the session to a variable called session, using the view()
The method displays the Phoenix application directly in the notebook interface, providing an integrated experience without having to switch between browser and notebook.
1.6 Setting the endpoint of the trace
To send trace data to a Phoenix application for analysis and observability, define the endpoint URL where the Phoenix application listens for incoming data.
endpoint = "http://127.0.0.1:6006/v1/traces"
endpoint
Variable storing the endpoint URL used by the Phoenix application to listen for incoming trace data.
2. Tracking OpenAI
More integration. Reading.
2.1 Installing and Importing OpenAI Packages
!pip install openai
import openai
openai
: A Python client library for the OpenAI API. It allows you to send requests to OpenAI's models (including GPT-3 and GPT-4) to accomplish a variety of tasks.
2.2 Configuring the OpenAI API Key
import openai
import os
from getpass import getpass
# Get the API key from the environment variable, if not set then prompt the user to enter it
if not (openai_api_key := os.getenv("OPENAI_API_KEY")):: openai_api_key = getpass("🖩")
openai_api_key = getpass("🔑 Enter your OpenAI API key:")
# Setting the API key for the OpenAI client
openai.api_key = openai_api_key
# stores the API key in an environment variable for subsequent use
os.environ["OPENAI_API_KEY"] = openai_api_key
- Getting the API key: the code first tries to get the API key from the environment variable (OPENAI_API_KEY). If the key is not found, the user is prompted for secure input via getpass.
- Set API key: The retrieved or supplied API key is then set as the key for the OpenAI client library.
- Storing the API key: Finally, store the API key in an environment variable to ensure that it is readily available during the session.
2.3 Setting up OpenTelemetry for Tracking
To enable tracing for your OpenAI interactions, configure OpenTelemetry and set up the required components.
from opentelemetry import trace as trace_api
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
# Setting up the Tracer Provider
tracer_provider = trace_sdk.TracerProvider()
# Define OTLP Span exporter with endpoints
span_exporter = OTLPSpanExporter(endpoint)
# Set up Span Processor to process and export spans
span_processor = SimpleSpanProcessor(span_exporter)
# Add Span Processor to the Tracer Provider Program
tracer_provider.add_span_processor(span_processor)
# Setting the global Tracer Provider program
trace_api.set_tracer_provider(tracer_provider)
OpenTelemetry Library
In the provided code, several OpenTelemetry libraries are used to set up tracing. Below is an overview of each library:
opentelemetry
:corresponds English -ity, -ism, -ization: OpenTelemetry's core library, providing APIs for tracking and metrics.usage: Includes the trace module for creating and managing traces.opentelemetry.exporter.otlp.proto.http.trace_exporter
:corresponds English -ity, -ism, -ization: Provides a trace exporter using OTLP (OpenTelemetry Protocol) over HTTP.usage: The module'sOTLPSpanExporter
class is used to send tracking data to an OTLP-compatible backend and is configured by setting up endpoints.opentelemetry.sdk.trace
:corresponds English -ity, -ism, -ization: Contains SDK implementations for tracking, includingTracerProvider
Theusage::TracerProvider
: Manages Tracer instances and is responsible for exporting spans (units of work) collected in a trace.SimpleSpanProcessor
: A processor that synchronizes export spans to process and send data to the exporter.
opentelemetry.sdk.trace.export
:corresponds English -ity, -ism, -ization: Provides classes for exporting tracking data.usage::SimpleSpanProcessor
: Processing spans and exporting them using the specified exporter ensures that the data is sent to the backend for analysis.
2.4 Instrumenting OpenAI with OpenInference
In order to integrate OpenTelemetry into OpenAI and enable tracking for OpenAI model interactions, use the tracking from the openinference
libraries OpenAIInstrumentor
The
from openinference.instrumentation.openai import OpenAIInstrumentor
# instantiates and applies instrumentation for OpenAI
OpenAIInstrumentor().instrument()
OpenAIInstrumentor
: Fromopeninference
library's classes for instrumenting OpenAI's API calls to enable tracking and observability.instrument()
: This method configures the OpenAI API client to automatically generate and send trace data to the OpenTelemetry backend. It integrates the configured trace settings and enables you to monitor and analyze interactions with OpenAI models.
By running this code, you can ensure that all OpenAI API calls are tracked to capture detailed insights about model usage and performance.
2.5 Making a Request to the OpenAI API
To interact with OpenAI's API and get a response, use the following code. This example shows how to create a chat-completion request and print the results via the OpenAI API:
import openai
# Create an instance of the OpenAI client
client = openai.OpenAI()
# makes a chat completions request to the OpenAI API
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a haiku."}], )
)
# Print the content of the response
print(response.choices[0].message.content)
openai.OpenAI()
: Initialize an OpenAI client instance that can be used to interact with the OpenAI API.client.chat.completions.create()
: Send a request to the OpenAI API to create a chat complement.model="gpt-4o"
: Specify the name of the model used to generate the complement. Make sure the model name is correct and that you have enabled it in your OpenAI API account.messages
: A list of messages that includes the dialog history. This example contains a single message request sent by the user: "Write a haiku."
response.choices[0].message.content
: Extracts and prints model-generated complements.
3. Tracking Llama index
3.1 Installing and Importing Required Libraries
!pip install llama-index
!pip install llama-index-core
!pip install llama-index-llms-openai
!pip install openinference-instrumentation-llama-index==2.2.4
! pip install -U llama-index-callbacks-arize-phoenix
!pip install "arize-phoenix[llama-index]"
llama-index
:: Core package for Llama Index functionality.llama-index-core
: Provides the core features and tools of Llama Index.llama-index-llms-openai
: A package for integrating Llama Index with OpenAI models.openinference-instrumentation-llama-index==2.2.4
: Provides tools for instrumenting Llama Index interactions.llama-index-callbacks-arize-phoenix
: Provides integration with Arize Phoenix callbacks.arize-phoenix [llama-index]
: Extend Arize Phoenix to support Llama Index tracking.
3.2 Getting the URL of the currently active Phoenix session
# Get the URL of the currently active Phoenix session
px.active_session().url
Access the currently active Phoenix session and retrieve its URL to view or share the Phoenix interface for monitoring and analyzing tracking data.
3.3 Setting up Llama Index tracking
To set up OpenTelemetry tracking for Llama Index, configure the Tracer provider and integrate Llama Index Instrumentor.
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
# Setting up the Tracer Provider
tracer_provider = trace_sdk.TracerProvider()
# Add Span Processor to Tracer Provider
tracer_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter(endpoint)))
# Instrumentation of the Llama Index using the Tracer provider.
LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)
LlamaIndexInstrumentor
: Fromopeninference.instrumentation.llama_index
class for tracking and observability instrumentation of the Llama Index.trace_sdk.TracerProvider()
: Initialize a new Tracer provider program for creating and managing trace data.SimpleSpanProcessor
: Used to synchronize the export of spans and send the data to the backend.LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)
: Apply instrumentation to the Llama Index and trace it using the provided Tracer provider.
3.4 Using OpenAI to Interact with the Llama Index
To perform a completion request through Llama Index using an OpenAI model, use the following code:
from llama_index.llms.openai import OpenAI
# Initialize the OpenAI model
llm = OpenAI(model="gpt-4o-mini")
# initiates a completion request
resp = llm.complete("Paul Graham is ")
# Print the response
print(resp)
from llama_index.llms.openai import OpenAI
: fromllama_index
package to import OpenAI classes for interaction with OpenAI models.OpenAI(model="gpt-4o-mini")
: Initialize the OpenAI class instance with the specified model (e.g., gpt-4).llm.complete(...)
: Send prompt text to the model to generate response content.
3.5 Chat Interaction with Llama Index Using OpenAI
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
# Initialize the OpenAI model
llm = OpenAI()
# Define Chat Message
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"), ]
]
# Getting the results of the model's response
resp = llm.chat(messages)
OpenAI
: Classes for interacting with OpenAI models.ChatMessage
: Class for formatting chat messages.OpenAI()
: Initialize an instance of the OpenAI class.ChatMessage
: Creates a chat message object, specifying the role (e.g., "system", "user") and the content of the message.role="system"
: Define system messages to set the context or personality of the model.role="user"
: Represents a message sent by the user.
llm.chat(messages)
: Sends the defined message to the model and receives the response result.
This code interacts with the OpenAI model through system and user message settings for chat.
4. Observation of the RAG process
4.1 Setting up the environment for observing the RAG process
!pip install llama-index
! pip install llama-index-vector-stores-qdrant
!pip install llama-index-readers-file
!pip install llama-index-embeddings-fastembed
! pip install llama-index-llms-openai
!pip install -U qdrant_client fastembed
llama-index
: Core package for Llama Index functionality.llama-index-vector-stores-qdrant
: Integrate Qdrant as a vector store for Llama Index.llama-index-readers-file
: Provides file reading capabilities for Llama Index.llama-index-embeddings-fastembed
: FastEmbed support for Llama Index to generate vector embeddings.llama-index-llms-openai
: Integrate OpenAI models into Llama Index.qdrant_client
: Client-side library that interacts with Qdrant, a vector search engine.fastembed
: Library for fast generation of embedding vectors.
4.2 Preparing a RAG Process with Embedding and Document Indexing
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.fastembed import FastEmbedEmbedding
# from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.settings import Settings
Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5")
# Settings.embed_model = OpenAIEmbedding(embed_batch_size=10)
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
VectorStoreIndex
: Classes for creating and managing vector store indexes. The index performs efficient similarity search and retrieval based on document vectors.SimpleDirectoryReader
: Class that reads documents from a specified directory and preprocesses the loaded files for indexing.FastEmbedEmbedding
: An embedding model class that generates text embedding vectors using the FastEmbed library. Specify the model name as"BAAI/bge-base-en-v1.5"
Thefrom llama_index.embeddings.openai import OpenAIEmbedding
::OpenAIEmbedding
: The embedding model class used to generate vectors via the OpenAI embedding service. It is commented out by default. If you wish to use an OpenAI model instead of FastEmbed, you can uncomment it and configure the parameters, such asembed_batch_size
The
Settings
: Class for setting the global embedding model configuration. The global embedding model configuration can be set by giving theembed_model
attribute assignment that specifies the embedding model to use.Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5")
: Setting up global useFastEmbedEmbedding
as an embedding vector model.documents = SimpleDirectoryReader("data").load_data()
: From the catalog"data"
Load and preprocess document data. Ensure that directory names are adjusted to the actual path of the project.index = VectorStoreIndex.from_documents(documents)
: Create a vector-stored index based on preprocessed documents. This step implements a vectorized representation of the document and allows vector-based queries.
4.3 Query Vector Storage Index
Once the vector store index is set up, you can use it to perform queries and retrieve relevant information.
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
response = query_engine.query("What did the author do growing up?")
as_query_engine()
:: WillVectorStoreIndex
Convert to a query engine. This engine allows you to perform searches and retrieve information based on a vector representation of documents stored in an index.query()
: Execute a query on a vector store index. The query string "What did the author do growing up?" is used to search for relevant documents and retrieve information based on the context provided by the vector embedding.
Finally.response
Contains information retrieved from vector storage indexes that are generated based on queries and indexed documents.
reach a verdict
In this guide, we set up a Retrieval Augmented Generation (RAG) process using Llama Index and integrated it with various components to observe its functionality. We first configured and installed the required libraries, including Llama Index, OpenTelemetry, and various embedding models.
We did the following next:
- Initialize and configure the embedding model, using FastEmbed or OpenAI models if necessary.
- Loads and indexes documents from the catalog to prepare data for querying.
- Set up the query engine to perform searches and retrieve relevant information based on indexed documents.
By following these steps, you have successfully prepared a RAG process that enables efficient document retrieval and query processing. The setup utilizes vector-based embedding and indexing to provide advanced search and information retrieval capabilities.
Feel free to experiment with different configurations and queries to further explore the capabilities of the RAG process. If you have any questions or need further customization, please consult the documentation of the libraries you are using or for more guidance.