Implementing a Native RAG Application with DeepSeek R1 and Ollama

🚀 Invitation to Experience: China's First AI IDE Intelligent Programming Software Trae Chinese version downloadThe DeepSeek-R1 and Doubao-pro are available for unlimited use!

summary

This document details how to utilize the DeepSeek R1 and Ollama Building localized RAG (Retrieval Augmented Generation) applications. It's also a great way to get the most out of the Building a Local RAG Application with LangChain The supplement.

We will demonstrate the complete implementation process through examples, including document processing, vector storage, model calling and other key steps. This tutorial uses DeepSeek-R1 1.5B as the base language model. Considering that different models have their own characteristics and performance, readers can choose other suitable models to realize according to actual needs RAG System.

Note: This document contains core code snippets and detailed explanations. The full code can be found at notebook The

preliminary

First, we need to download Ollama and configure the environment.

Ollama's GitHub repository provides a detailed description, which is briefly summarized as follows.

Step1, download Ollama.

downloading and double-click to run the Ollama application.

Step2, verify the installation.

At the command line, type ollamaIf the following message appears, Ollama has been successfully installed.

Step3, pull the model.

From the command line, refer to Ollama Model List cap (a poem) List of text embedding models Pulling the model. In that tutorial, we take the deepseek-r1:1.5b cap (a poem) nomic-embed-text Example.
- command line input ollama pull deepseek-r1:1.5bPulling Generic Open Source Large Language Models deepseek-r1:1.5b; (It may be slow when pulling models. If there is a pulling error, you can re-enter the command to pull)
- command line input ollama pull nomic-embed-text pull Text Embedding Model nomic-embed-textThe
When the application is running, all models will automatically be added in the localhost:11434 On the startup.
Note that your model selection needs to take into account your local hardware capabilities, the reference video memory size for this tutorial CPU Memory > 8GBThe

Step4, deploy the model.

A command line window runs the following command to deploy the model.

ollama run deepseek-r1:1.5b

It is also possible to run the deployment model directly from the command line, for example ollama run deepseek-r1:1.5bThe

Note that the following steps are not necessary if you only want to deploy DeepSeek R1 models using Ollama.

Step5, install the dependencies.

# langchain_community
pip install langchain langchain_community

# Chroma
pip install langchain_chroma

# Ollama
pip install langchain_ollama

Once we've done the prep work, let's start building a step-by-step solution based on LangChain, Ollama, and DeepSeek R1 The following is a detailed description of the implementation steps. The implementation steps are described in detail below.

1. Document loading

Load PDF documents and cut them into appropriately sized text blocks.

from langchain_community.document_loaders import PDFPlumberLoader
file = "DeepSeek_R1.pdf"
# Load the PDF
loader = PDFPlumberLoader(file)
docs = loader.load()
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(docs)

2. Initialize vector storage

Use the Chroma database to store the document vectors and configure the embedding model provided by Ollama.

from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings
local_embeddings = OllamaEmbeddings(model="nomic-embed-text")
vectorstore = Chroma.from_documents(documents=all_splits, embedding=local_embeddings)

3. Construction of Chain expressions

Set up models and cue templates to build processing chains.

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama import ChatOllama
model = ChatOllama(
model="deepseek-r1:1.5b",
)
prompt = ChatPromptTemplate.from_template(
"Summarize the main themes in these retrieved docs: {docs}"
)
# 将传入的文档转换成字符串的形式
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
chain = {"docs": format_docs} | prompt | model | StrOutputParser()
question = "What is the purpose of the DeepSeek project?"
docs = vectorstore.similarity_search(question)
chain.invoke(docs)

4. QA with search

Integrate search and Q&A functions.

from langchain_core.runnables import RunnablePassthrough
RAG_TEMPLATE = """
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
<context>
{context}
</context>
Answer the following question:
{question}"""
rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)
retriever = vectorstore.as_retriever()
qa_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| rag_prompt
| model
| StrOutputParser()
)
question = "What is the purpose of the DeepSeek project?"
# Run
qa_chain.invoke(question)

summarize

This tutorial details how to build a localized RAG application using DeepSeek R1 and Ollama. We achieve full functionality in four main steps:

file processing : Use PDFPlumberLoader to load PDF documents and RecursiveCharacterTextSplitter to slice the text into appropriately sized chunks.
vector storage : A vector storage system using the Chroma database and Ollama's embedding model to provide a basis for subsequent similarity retrieval.
Chain Construction : Design and implement a processing chain that integrates document processing, cue templates, and model responses into a flow-through process.
RAG Realization : By integrating retrieval and Q&A functionality, a complete retrieval enhancement generation system is realized, capable of answering user queries based on document content.

With this tutorial, you can quickly build your own local RAG system and customize it to improve it according to actual needs. It is recommended to try out different models and parameter configurations in practice to get the best use out of it.

Note: Using tools such as streamlit or FastAPI, it is possible to deploy a local RAG application as a web service, enabling a wider range of application scenarios.

The repository also provides app.py file, you can run the file directly to start the Web service. Refer to the documentation Build a RAG System with DeepSeek R1 & Ollama. Note: Run the Ollama service in advance before running this code.

The dialog page is below:

Implementing a Native RAG Application with DeepSeek R1 and Ollama

summary

preliminary

1. Document loading

2. Initialize vector storage

3. Construction of Chain expressions

4. QA with search

summarize

Related articles

Recommended

Can't find AI tools? Try here!

FLUX.1 image generator (supports Chinese input)

Recent AI Hotspots

AI Tools Recommendations

AI Tools Classification