summary
本文档介绍了如何在 Python 环境中使用 Ollama 与 LangChain 集成,以创建强大的 AI 应用。Ollama 是一个开源的大语言模型部署工具,而 LangChain 则是一个用于构建基于语言模型的应用的框架。通过结合这两者,我们可以在本地环境中快速部署和使用先进的AI模型。
注: 本文档包含核心代码片段和详细解释。完整代码可在此Jupyter notebookFound in.
1. Environmental settings
配置 Conda 环境
首先,我们需要在 Jupyter 中使用 Conda 环境。在命令行中执行以下命令:
conda create -n handlm python=3.10 -y
conda activate handlm
pip install jupyter
python -m ipykernel install --user --name=handlm
执行完毕后,重启 Jupyter,并选择该环境的 Kernel,如图所示:
⚠️ 注意
Attention: 也可以不使用conda虚拟环境,直接使用全局环境。
Installation of dependencies
在开始之前,我们需要安装以下包:
langchain-ollama
: 用于集成 Ollama 模型到 LangChain 框架中langchain
: LangChain 的核心库,提供了构建 AI 应用的工具和抽象langchain-community
: 包含了社区贡献的各种集成和工具Pillow
: 用于图像处理,在多模态任务中会用到faiss-cpu
: 用于构建简单 RAG retriever
可以通过以下命令安装:
pip install langchain-ollama langchain langchain-community Pillow faiss-cpu
2. 下载所需模型并初始化 OllamaLLM
下载 llama3.1 模型
- 进入官网 https://ollama.com/download 下载并安装 Ollama 到可用的受支持平台。
- 查看 https://ollama.ai/library 了解所有可用的模型。
- pass (a bill or inspection etc)
ollama pull <name-of-model>
命令获取可用 LLM 模型(例如:ollama pull llama3.1
).
命令行执行完毕后如图所示:
模型存储位置:
- Mac:
~/.ollama/models/
- Linux(或 WSL):
/usr/share/ollama/.ollama/models
- Windows.
C:\Users\Administrator\.ollama\models
3. 基本使用示例
使用 ChatPromptTemplate 进行对话
ChatPromptTemplate 允许我们创建一个可重用的模板,其中包含一个或多个参数。这些参数可以在运行时动态替换,以生成不同的提示。
template = """
你是一个乐于助人的AI,擅长于解决回答各种问题。
问题:{question}
"""
prompt = ChatPromptTemplate.from_template(template)
chain = prompt | model
chain.invoke({"question": "你比GPT4厉害吗?"})
在创建链部分,使用管道操作符 |
,它将 prompt 和 model 连接起来,形成一个处理流程。这种链式操作使得我们可以轻松地组合和重用不同的组件。
invoke
方法触发整个处理链,将我们的问题传入模板,然后将格式化后的提示发送给模型进行处理。
streaming output
流式输出是一种在生成长文本时逐步返回结果的技术。这种方法有几个重要的优势:
- 提高用户体验:用户可以立即看到部分结果,而不是等待整个响应完成。
- 减少等待时间:对于长回答,用户可以在完整回答生成之前就开始阅读。
- 实时交互:允许在生成过程中进行干预或终止。
在实际应用中,特别是在聊天机器人或实时对话系统中,流式输出几乎是必不可少的。
from langchain_ollama import ChatOllama
model = ChatOllama(model="llama3.1", temperature=0.7)
messages = [
("human", "你好呀"),
]
for chunk in model.stream(messages):
print(chunk.content, end='', flush=True)
model.stream()
method is a wrapper around the Ollama API's streaming output interface, which returns a generator object. When calling the model.stream(messages)
When you do, the following operations are completed:
- Send a request to the Ollama API to start generating a response.
- The API starts generating text, but instead of waiting until it's all generated, it returns it in small chunks.
- For each small piece of text received, the
stream()
method yields the block of text. flush=True
Make sure that each clip is displayed immediately, rather than waiting for the buffer to fill up.
Tool Call
Tool calls are the ability of an AI model to interact with external functions or APIs. This allows the model to perform complex tasks such as mathematical computations, data queries, or external service calls.
def simple_calculator(operation: str, x: float, y: float) -> float.
'''The actual code processing logic'''
llm = ChatOllama(
model="llama3.1",
temperature=0,
).bind_tools([simple_calculator])
result = llm.invoke("Do you know what 10 million times two is?")
bind_tools
method allows us to register a custom function into the model. This way, when the model encounters a problem that needs to be computed, it can call this function to get accurate results, rather than relying on its pre-training knowledge.
This capability is useful when building complex AI applications, for example:
- Create chatbots that can access real-time data
- Build intelligent assistants that can perform specific tasks (e.g., booking, querying, etc.)
- Development of AI systems capable of performing precise calculations or complex operations
multimodal model
Ollama supports multimodal models such as bakllava and llava. multimodal models are AI models that are capable of handling multiple types of input (e.g., text, images, audio, etc.). These models excel at understanding and generating cross-modal content, enabling more complex and natural human-computer interactions.
First, the multimodal model needs to be downloaded. Execute it at the command line:
ollama pull llava
We can then use the following code to process the image and text input:
from langchain_ollama import ChatOllama
from langchain_core.messages import HumanMessage
from langchain_core.output_parsers import StrOutputParser
llm = ChatOllama(model="llava", temperature=0)
def prompt_func(data).
'''Construct multimodal input'''
chain = prompt_func | llm | StrOutputParser()
query_chain = chain.invoke(
{"text": "What animal is in this picture?" , "image": image_b64}
)
The key point here is:
- Image preprocessing: we need to convert the image to a base64 encoded string.
- Hint function:
prompt_func
A multimodal input containing text and images is created. - Chaining: we use
|
operator connects the hint function, the model, and the output parser.
Multimodal models are useful in many scenarios, for example:
- Image description generation
- visual question and answer system
- Image-based content analysis and recommendation
4. Advanced usage
Conversation using ConversationChain
ConversationChain
is a powerful tool provided by LangChain for managing multi-round conversations. It combines language models, prompt templates, and in-memory components to make it easy to create context-aware dialog systems.
memory = ConversationBufferMemory()
conversation = ConversationChain(
llm=model,
memory=memory, verbose=True
verbose=True
)
# Conduct the conversation
response = conversation.predict(input="Hello, I'd like to learn about artificial intelligence.")
print("AI:", response)
response = conversation.predict(input="Can you give me an example of AI in everyday life?")
print("AI:", response)
response = conversation.predict(input="That sounds interesting. What are the applications of AI in healthcare?")
print("AI:", response)
The key component here is:
ConversationBufferMemory
: This is a simple in-memory component that stores the history of all previous conversations.ConversationChain
: It combines a language model, memory, and a default dialog prompt template.
Maintaining a dialog history is important because it allows for modeling:
- Understanding context and previously mentioned information
- Generate more coherent and relevant responses
- Handling complex multi-round dialog scenarios
In practice, you may need to consider using more advanced memory components, such as the ConversationSummaryMemory
to handle long dialogs and avoid exceeding the model's context length limit.
Customized prompt templates
Well-designed prompt templates are key to creating efficient AI applications. In this example, we created a complex prompt for generating product descriptions:
system_message = SystemMessage(content=""")
You are an experienced e-commerce copywriting expert. Your task is to create engaging item descriptions based on the given product information.
Make sure your description is concise, powerful, and highlights the core benefits of the product.
""")
human_message_template = """"
Please create an engaging product description for the following product:
Product type: {product_type}
Core features: {key_feature}
Target audience: {target_audience}
Price range: {price_range}
Brand positioning: {brand_positioning}
Please provide a description of each of the following three different styles in approximately 50 words each:
1. Rational analysis
2. emotional appeal
3. Storytelling
"""
# Example usage
product_info = {
"product_type": "Smartwatch",
"key_feature": "Heart rate monitoring and sleep analytics",
"target_audience": "Health-conscious young professionals",
"brand_positioning": "The perfect combination of technology and health"
}
There are several important design considerations for this structure:
- system_prompt: defines the roles and overall tasks of the AI, setting the tone of the entire conversation.
- human_message_template: provides the structure of specific instructions and required messages.
- Multi-parameter design: allows flexibility to adapt to different products and needs.
- Diversity Output Requirement: encourages models to demonstrate their diversity by requiring different styles of description.
Consider the following when designing an effective prompt template:
- Clearly define the role and mission of AI
- Provide a clear, structured input format
- Contains specific output requirements and formatting guidance
- Consider how to maximize the model's capabilities and creativity
Building a simple RAG Q&A system
RAG (Retrieval-Augmented Generation) is an AI technique that combines retrieval and generation to augment the answering ability of a language model by retrieving relevant information.The workflow of a RAG system typically includes the following steps:
- Split knowledge base documents into chunks and create vector indexes
- Vectorize the user's problem and retrieve relevant documents in an index
- Provide the retrieved relevant documents to the language model as context along with the original question
- The language model generates responses based on the retrieved information
The advantage of RAG is that it helps language models access up-to-date and specialized information, reduces illusions, and improves the accuracy and relevance of responses.
LangChain provides a variety of components that can be seamlessly integrated with Ollama models. Here we will show how to use the Ollama model in conjunction with a vector store and a retriever to create a simple RAG question and answer system.
First you need to make sure that the embedding model is downloaded, which can be done by executing the following command at the command line:
ollama pull nomic-embed-text
We can then build the RAG system:
# Initialize Ollama model and embedding
llm = ChatOllama(model="llama3.1")
embeddings = OllamaEmbeddings(model="nomic-embed-text")
# Prepare the document
text = """
Datawhale is an open source organization focusing on the field of data science and AI, bringing together outstanding learners from many institutions and well-known companies in the field, and aggregating a group of team members with the spirit of open source and exploration.
With the vision of "for the learner, growing with the learner", Datawhale encourages true self-expression, openness and tolerance, mutual trust and mutual support, the courage to trial and error, and the courage to take responsibility.
At the same time, Datawhale uses the concept of open source to explore open source content, open source learning and open source programs, empowering talent development, helping talent grow, and establishing a connection between people, knowledge, business and the future.
If you want to launch an open source project in the Datawhale open source community, please read the Datawhale open source project guidelines in detail [https://github.com/datawhalechina/DOPMC/blob/main/GUIDE.md]
"""
# Split text
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
chunks = text_splitter.split_text(text)
# Create a vector store
vectorstore = FAISS.from_texts(chunks, embeddings)
retriever = vectorstore.as_retriever()
# Create the prompt template
template = """Questions can only be answered using the following.
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
# Creating a search-question-answer chain
chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| llm
)
# Answer a question using chain
question = "What should I do if I want to contribute to datawhale?"
response = chain.invoke(question)
This RAG system works as follows:
- Text segmentation: use the
RecursiveCharacterTextSplitter
Split long text into smaller chunks. - Vectorization and indexing: using
OllamaEmbeddings
Converts a block of text to a vector and creates a vector index with FAISS. - Retrieval: When a question is received, the system vectorizes the question and retrieves the most relevant block of text in the FAISS index.
- Generating Answers: the retrieved relevant text chunks are provided to the language model along with the original question to generate the final answer.
The RAG system is very useful in many real-life scenarios, for example:
- Customer service: customer inquiries can be answered quickly based on the company's knowledge base.
- Research Aid: Helps researchers quickly find relevant literature and summarize key information.
- Personal Assistant: Combines personal notes and web information to provide personalized information retrieval and suggestions.
reach a verdict
With these examples, we show how to use Ollama and LangChain to build a variety of AI applications, ranging from simple dialog systems to complex RAG Q&A systems. These tools and techniques provide a solid foundation for developing powerful AI applications.
The combination of Ollama and LangChain provides developers with great flexibility and possibilities. You can choose the right models and components according to your specific needs and build an AI system that suits your application scenario.
As the technology continues to evolve, we expect to see more innovative applications emerge. Hopefully, this guide will help you get started on your AI development journey and inspire your creativity to explore the endless possibilities of AI technology.