올라마+랑체인으로 네이티브 RAG 애플리케이션 구축하기

AI 실습 튜토리얼10개월 전에 게시 됨 AI 공유 서클

50.1K 00

이 튜토리얼에서는 다음 개념에 이미 익숙하다고 가정합니다.

채팅 모델
러너블 체인 연결
임베딩
벡터 스토어
검색 증강 세대

다음과 같은 많은 인기 아이템 llama.cpp , Ollama 및 llamafile 는 로컬 환경에서 대규모 언어 모델을 실행하는 것의 중요성을 보여줍니다.

LangChain은 로컬에서 운영되는 여러 오픈 소스 LLM 공급업체 여러 가지 통합 기능이 있으며, 올라마도 그 중 하나입니다.

환경 설정

먼저 환경을 설정해야 합니다.

올라마의 GitHub 리포지토리에는 다음과 같이 요약된 자세한 설명이 나와 있습니다.

Ollama 애플리케이션 다운로드 및 실행
명령줄에서 Ollama 모델 목록과 텍스트 임베딩 모델 목록 모델 풀링하기. 이 튜토리얼에서는 llama3.1:8b 노래로 응답 nomic-embed-text 예시.
- 명령줄 입력 ollama pull llama3.1:8b일반 오픈 소스 대규모 언어 모델 가져오기 llama3.1:8b
- 명령줄 입력 ollama pull nomic-embed-text pull 텍스트 임베딩 모델 nomic-embed-text
애플리케이션이 실행 중이면 모든 모델이 자동으로 localhost:11434 업스트림으로 이동
모델을 선택할 때는 로컬 하드웨어 성능, 이 튜토리얼의 참조 비디오 메모리 크기를 고려해야 합니다. GPU Memory > 8GB

다음으로 로컬 임베딩, 벡터 스토리지, 모델 추론에 필요한 패키지를 설치합니다.

# langchain_community
%pip install -qU langchain langchain_community
# Chroma
%pip install -qU langchain_chroma
# Ollama
%pip install -qU langchain_ollama

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.

또한 다음을 수행할 수도 있습니다. 이 페이지 참조 에서 사용 가능한 임베딩 모델 전체 목록을 확인하세요.

문서 로드

이제 샘플 문서를 로드하고 분할해 보겠습니다.

릴리안 웡의 에이전트에 대한 글을 사용하겠습니다. 블로그(외래어) 예를 들어

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

다음으로 벡터 저장소가 초기화됩니다. 우리가 사용하는 텍스트 임베딩 모델은 nomic-embed-text .

from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings
local_embeddings = OllamaEmbeddings(model="nomic-embed-text")
vectorstore = Chroma.from_documents(documents=all_splits, embedding=local_embeddings)

이제 로컬 벡터 데이터베이스를 얻었습니다! 유사도 검색을 간단히 테스트해 보겠습니다.

question = "What are the approaches to Task Decomposition?"
docs = vectorstore.similarity_search(question)
len(docs)

docs[0]

Document(metadata={'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log"}, page_content='Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.')

다음으로, 빅 언어 모델을 인스턴스화합니다. llama3.1:8b 를 사용하여 모델 추론이 제대로 작동하는지 테스트합니다:

from langchain_ollama import ChatOllama
model = ChatOllama(
model="llama3.1:8b",
)

response_message = model.invoke(
"Simulate a rap battle between Stephen Colbert and John Oliver"
)
print(response_message.content)

**The scene is set: a packed arena, the crowd on their feet. In the blue corner, we have Stephen Colbert, aka "The O'Reilly Factor" himself. In the red corner, the challenger, John Oliver. The judges are announced as Tina Fey, Larry Wilmore, and Patton Oswalt. The crowd roars as the two opponents face off.**
**Stephen Colbert (aka "The Truth with a Twist"):**
Yo, I'm the king of satire, the one they all fear
My show's on late, but my jokes are clear
I skewer the politicians, with precision and might
They tremble at my wit, day and night
**John Oliver:**
Hold up, Stevie boy, you may have had your time
But I'm the new kid on the block, with a different prime
Time to wake up from that 90s coma, son
My show's got bite, and my facts are never done
**Stephen Colbert:**
Oh, so you think you're the one, with the "Last Week" crown
But your jokes are stale, like the ones I wore down
I'm the master of absurdity, the lord of the spin
You're just a British import, trying to fit in
**John Oliver:**
Stevie, my friend, you may have been the first
But I've got the skill and the wit, that's never blurred
My show's not afraid, to take on the fray
I'm the one who'll make you think, come what may
**Stephen Colbert:**
Well, it's time for a showdown, like two old friends
Let's see whose satire reigns supreme, till the very end
But I've got a secret, that might just seal your fate
My humor's contagious, and it's already too late!
**John Oliver:**
Bring it on, Stevie! I'm ready for you
I'll take on your jokes, and show them what to do
My sarcasm's sharp, like a scalpel in the night
You're just a relic of the past, without a fight
**The judges deliberate, weighing the rhymes and the flow. Finally, they announce their decision:**
Tina Fey: I've got to go with John Oliver. His jokes were sharper, and his delivery was smoother.
Larry Wilmore: Agreed! But Stephen Colbert's still got that old-school charm.
Patton Oswalt: You know what? It's a tie. Both of them brought the heat!
**The crowd goes wild as both opponents take a bow. The rap battle may be over, but the satire war is just beginning...

체인 표현식 구성하기

검색된 문서와 간단한 프롬프트를 전달하여 다음과 같이 빌드할 수 있습니다. summarization chain .

제공된 입력 키 값을 사용하여 프롬프트 템플릿의 형식을 지정하고 형식이 지정된 문자열을 지정된 모델에 전달합니다:

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template(
"Summarize the main themes in these retrieved docs: {docs}"
)
# 将传入的文档转换成字符串的形式
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
chain = {"docs": format_docs} | prompt | model | StrOutputParser()
question = "What are the approaches to Task Decomposition?"
docs = vectorstore.similarity_search(question)
chain.invoke(docs)

'The main themes in these documents are:\n\n1. **Task Decomposition**: The process of breaking down complex tasks into smaller, manageable subgoals is crucial for efficient task handling.\n2. **Autonomous Agent System**: A system powered by Large Language Models (LLMs) that can perform planning, reflection, and refinement to improve the quality of final results.\n3. **Challenges in Planning and Decomposition**:\n\t* Long-term planning and task decomposition are challenging for LLMs.\n\t* Adjusting plans when faced with unexpected errors is difficult for LLMs.\n\t* Humans learn from trial and error, making them more robust than LLMs in certain situations.\n\nOverall, the documents highlight the importance of task decomposition and planning in autonomous agent systems powered by LLMs, as well as the challenges that still need to be addressed.'

간단한 QA

from langchain_core.runnables import RunnablePassthrough
RAG_TEMPLATE = """
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
<context>
{context}
</context>
Answer the following question:
{question}"""
rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)
chain = (
RunnablePassthrough.assign(context=lambda input: format_docs(input["context"]))
| rag_prompt
| model
| StrOutputParser()
)
question = "What are the approaches to Task Decomposition?"
docs = vectorstore.similarity_search(question)
# Run
chain.invoke({"context": docs, "question": question})

'Task decomposition can be done through (1) simple prompting using LLM, (2) task-specific instructions, or (3) human inputs. This approach helps break down large tasks into smaller, manageable subgoals for efficient handling of complex tasks. It enables agents to plan ahead and improve the quality of final results through reflection and refinement.'

검색을 통한 QA

마지막으로, 시맨틱 검색 기능을 갖춘 QA 애플리케이션(로컬 RAG 애플리케이션)은 사용자 질문에 따라 벡터 데이터베이스에서 의미론적으로 가장 유사한 문서 조각을 자동으로 검색할 수 있습니다:

retriever = vectorstore.as_retriever()
qa_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| rag_prompt
| model
| StrOutputParser()
)

question = "What are the approaches to Task Decomposition?"
qa_chain.invoke(question)

'Task decomposition can be done through (1) simple prompting in Large Language Models (LLM), (2) using task-specific instructions, or (3) with human inputs. This process involves breaking down large tasks into smaller, manageable subgoals for efficient handling of complex tasks.'

요약

이 시점에서 여러분은 Langchain 프레임워크와 로컬 모델을 기반으로 구축된 RAG 애플리케이션을 완전히 구현했습니다. 튜토리얼을 기반으로 로컬 모델을 대체하여 다양한 모델의 효과와 기능을 실험하거나, 애플리케이션의 기능과 표현력을 강화하기 위해 더 확장하거나, 더 유용하고 흥미로운 기능을 추가할 수 있습니다.