summary
This document details how to utilize the DeepSeek R1 and Ollama Building localized RAG (Retrieval Augmented Generation) applications. It's also a great way to get the most out of the Building a Local RAG Application with LangChain The supplement.
We will demonstrate the complete implementation process through examples, including document processing, vector storage, model calling and other key steps. This tutorial uses DeepSeek-R1 1.5B as the base language model. Considering that different models have their own characteristics and performance, readers can choose other suitable models to realize according to actual needs RAG System.
Note: This document contains core code snippets and detailed explanations. The full code can be found at notebook The
preliminary
First, we need to download Ollama and configure the environment.
Ollama's GitHub repository provides a detailed description, which is briefly summarized as follows.
Step1, download Ollama.
downloading and double-click to run the Ollama application.
Step2, verify the installation.
At the command line, type ollama
If the following message appears, Ollama has been successfully installed.
Step3, pull the model.
- From the command line, refer to Ollama Model List cap (a poem) List of text embedding models Pulling the model. In that tutorial, we take the
deepseek-r1:1.5b
cap (a poem)nomic-embed-text
Example.- command line input
ollama pull deepseek-r1:1.5b
Pulling Generic Open Source Large Language Modelsdeepseek-r1:1.5b
; (It may be slow when pulling models. If there is a pulling error, you can re-enter the command to pull) - command line input
ollama pull nomic-embed-text
pull Text Embedding Modelnomic-embed-text
The
- command line input
- When the application is running, all models will automatically be added in the
localhost:11434
On the startup. - Note that your model selection needs to take into account your local hardware capabilities, the reference video memory size for this tutorial
CPU Memory > 8GB
The
Step4, deploy the model.
A command line window runs the following command to deploy the model.
ollama run deepseek-r1:1.5b
It is also possible to run the deployment model directly from the command line, for example
ollama run deepseek-r1:1.5b
The
Note that the following steps are not necessary if you only want to deploy DeepSeek R1 models using Ollama.
Step5, install the dependencies.
# langchain_community
pip install langchain langchain_community
# Chroma
pip install langchain_chroma
# Ollama
pip install langchain_ollama
Once we've done the prep work, let's start building a step-by-step solution based on LangChain, Ollama, and DeepSeek R1 The following is a detailed description of the implementation steps. The implementation steps are described in detail below.
1. Document loading
Load PDF documents and cut them into appropriately sized text blocks.
from langchain_community.document_loaders import PDFPlumberLoader
file = "DeepSeek_R1.pdf"
# Load the PDF
loader = PDFPlumberLoader(file)
docs = loader.load()
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(docs)
2. Initialize vector storage
Use the Chroma database to store the document vectors and configure the embedding model provided by Ollama.
from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings
local_embeddings = OllamaEmbeddings(model="nomic-embed-text")
vectorstore = Chroma.from_documents(documents=all_splits, embedding=local_embeddings)
3. Construction of Chain expressions
Set up models and cue templates to build processing chains.
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama import ChatOllama
model = ChatOllama(
model="deepseek-r1:1.5b",
)
prompt = ChatPromptTemplate.from_template(
"Summarize the main themes in these retrieved docs: {docs}"
)
# Convert incoming docs to string form
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
chain = {"docs": format_docs} | prompt | model | StrOutputParser()
question = "What is the purpose of the DeepSeek project?"
docs = vectorstore.similarity_search(question)
chain.invoke(docs)
4. QA with search
Integrate search and Q&A functions.
from langchain_core.runnables import RunnablePassthrough
RAG_TEMPLATE = """
You are an assistant for question-answering tasks. Use the following pieces If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise. Use three sentences maximum and keep the answer concise.
<context
{context}
</context
Answer the following question.
{question}"""
rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)
retriever = vectorstore.as_retriever()
qa_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| rag_prompt
| model
| StrOutputParser()
)
question = "What is the purpose of the DeepSeek project?"
# Run
qa_chain.invoke(question)
summarize
This tutorial details how to build a localized RAG application using DeepSeek R1 and Ollama. We achieve full functionality in four main steps:
- file processing : Use PDFPlumberLoader to load PDF documents and RecursiveCharacterTextSplitter to slice the text into appropriately sized chunks.
- vector storage : A vector storage system using the Chroma database and Ollama's embedding model to provide a basis for subsequent similarity retrieval.
- Chain Construction : Design and implement a processing chain that integrates document processing, cue templates, and model responses into a flow-through process.
- RAG Realization : By integrating retrieval and Q&A functionality, a complete retrieval enhancement generation system is realized, capable of answering user queries based on document content.
With this tutorial, you can quickly build your own local RAG system and customize it to improve it according to actual needs. It is recommended to try out different models and parameter configurations in practice to get the best use out of it.
Note: Using tools such as streamlit or FastAPI, it is possible to deploy a local RAG application as a web service, enabling a wider range of application scenarios.
The repository also provides
app.py
file, you can run the file directly to start the Web service. Refer to the documentation Build a RAG System with DeepSeek R1 & Ollama. Note: Run the Ollama service in advance before running this code.
The dialog page is below: