LLM应用：Agent对话(附工具调用)的思考

AI实操教程2年前发布 AI分享圈

46.4K 00

ChatGPT和Kimi等问答产品都在用Agent对话（能调用不同工具与用户进行交互），比如Kimi的工具就有LLM对话、链接对话、文件对话、联网对话。比如ChatGPT、文心一言和讯飞星火等，还扩展有文生图、代码编写、数学计算器等工具。

ChatGPT4的Agent对话

现在实现Agent对话能力的主流框架是ReAct（2022年由普林斯顿大学和Google提出），ReAct[1]就是一种提示词方法，融合了思考和行动。它的历史演变如下图所示：

上图中3种方法分别为：

Reason Only：使用Chain-of-thought进行多步思考，在Prompt前加入问题输入前加入“Let’s think step by step”的提示词，引导多步思考，别直接给答案，但缺点很明显：Reason Only闭门光想不做，不会出去看看了解外面的世界再思考，因此会产生幻觉，朝代更迭了都不知道；

Act-Only：通过单步行动Action获取观察结果Observation，缺点：没想好就立马做，最后得到的答案可能跟我想要的不一定能保证一致；

ReAct：融合了思考和行动，即先思考再行动，把行动后的结果返回后再思考下一步该做什么，再行动，不断重复该过程，知道产生最终答案。

扩展：2023年还出了一个自我反思（Reflexion）的框架，加入了反思，如下图所示，这里不做深入。另外有兴趣的朋友可以看下大淘宝技术公众号的《Agent调研--19类Agent框架对比》，这块为了发论文真的是越玩越花。

展示ReAct论文提到的例子，能更好说明以上逻辑和优缺点：

ReAct的Prompt是：

"""
Agent Prompt的输入变量解释：
- tools: 工具集说明，格式为"{tool.name}: {tool.description}"
- toool_names：工具名称列表
- history: 用户和Agent的对话历史（注意agent chat中间多轮ReAct过程不计入对话历史）
- input: 用户提问
- agent_scratchpad: 中间action和observation的过程，
会被格式成"\nObservation: {observation}\nThought:{action} "，
然后传入agent_scratchpad（Agent的思维记录）
"""
agent_prompt = """
Answer the following questions as best you can. If it is in order, you can use some tools appropriately. You have access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer1
Thought: you should always think about what to do and what tools to use.
Action: the action to take, should be one of {toool_names}
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can be repeated zero or more times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
history: {history}
Question: {input}
Thought: {agent_scratchpad}"""

ReAct应用流程图如下所示：

网上有个例子挺好的，这里参考来做举例阐释[2]：

假设我们有：

用户提问（Question）：”目前市场上玫瑰花的平均价格是多少？如果我在此基础上加价15%卖出，应该如何定价？”

工具库（Tools）：{’bing web search’: 用必应Search搜索网络开源信息的工具；‘llm-math’: 用大模型和Python做数学运算的工具}

那么第一轮对话的输入为：

Answer the following questions as best you can. If it is in order, you can use some tools appropriately. You have access to the following tools:
bing-web-search: 用必应Search搜索网络开源信息的工具
llm-math: 用大模型和Python做数学运算的工具
Use the following format:
Question: the input question you must answer1
Thought: you should always think about what to do and what tools to use.
Action: the action to take, should be one of [bing-web-search, llm-math]Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can be repeated zero or more times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
history:
Question: 目前市场上玫瑰花的平均价格是多少？如果我在此基础上加价15%卖出，应该如何定价？
Thought:

得到输出后解析，获取Thought、Action和Action Input：

Thought: 我应该使用搜索工具来查找答案，这样我可以快速地找到所需的信息。
Action: bing-web-search
Action Input: 玫瑰花平均价格

调用bing-web-search工具，输入"玫瑰花平均价格"，获取返回内容Observation：”根据网络资料显示，美国每束玫瑰花在80.16美元。”。之后将以上内容，整理再放入ReAct的提示词模板中，开启第二轮对话的输入：

Answer the following questions as best you can. If it is in order, you can use some tools appropriately. You have access to the following tools:
bing-web-search: 用必应Search搜索网络开源信息的工具
llm-math: 用大模型和Python做数学运算的工具
Use the following format:
Question: the input question you must answer1
Thought: you should always think about what to do and what tools to use.
Action: the action to take, should be one of [bing-web-search, llm-math]Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can be repeated zero or more times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
history:
Question: 目前市场上玫瑰花的平均价格是多少？如果我在此基础上加价15%卖出，应该如何定价？
Thought: 我应该使用搜索工具来查找答案，这样我可以快速地找到所需的信息。
Action: bing-web-search
Action Input: 玫瑰花平均价格
Observation: 根据网络资料显示，美国每束玫瑰花在80.16美元。
Thought:

得到输出后解析，获取Thought、Action和Action Input：

Thought: 我需要数学计算在此基础上加价15%的价格是多少。
Action: llm-math
Action Input: 80.16*1.15

调用llm-math工具，输入"80.16*1.15"，获取返回内容Observation：”92.184”。之后将以上内容，整理再放入ReAct的提示词模板中，开启第三轮对话的输入：

Answer the following questions as best you can. If it is in order, you can use some tools appropriately. You have access to the following tools:
bing-web-search: 用必应Search搜索网络开源信息的工具
llm-math: 用大模型和Python做数学运算的工具
Use the following format:
Question: the input question you must answer1
Thought: you should always think about what to do and what tools to use.
Action: the action to take, should be one of [bing-web-search, llm-math]Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can be repeated zero or more times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
history:
Question: 目前市场上玫瑰花的平均价格是多少？如果我在此基础上加价15%卖出，应该如何定价？
Thought: 我应该使用搜索工具来查找答案，这样我可以快速地找到所需的信息。
Action: bing-web-search
Action Input: 玫瑰花平均价格
Observation: 根据网络资料显示，美国每束玫瑰花在80.16美元。
Thought: 我需要数据计算在此基础上加价15的价格是多少。
Action: llm-math
Action Input: 80.16*1.15
Observation: 92.184
Thought:

得到输出后解析，获取Thought、Action和Action Input：

Thought: 我知道最终答案了。
Final Answer: 如果要加价15%卖，应该定价为92.184美元。

Langchain已实现了ReAct的调用[3]，具体的需要你去创建ReAct Agent、AgentExecutor和tools。通过ReAct Agent，我们便能实现前面提到的，连接不同工具，根据用户需求和LLM的思考去调用不同工具提高问答表现。如果自己重写ReAct-based的Agent Chat，特别需要注意一些边界条件，比如选择工具不存在、工具调用失败，工具调用陷入死循环等情况，因为这些会最终影响交互的效果。个人在实践ReAct Agent Chat效果时，有几下心得：

选择做过Agent对话能力对齐的LLM，最好ReAct Prompt能适配LLM；
清晰描述好工具介绍，减少工具之间的描述模糊，避免用错工具；
小心多轮思考-行动-观察后，导致输入上下文过多，而很多LLM对长上下文输入理解能力会变差。
如果只有单轮思考-行动-观察，那么一套流程下来，需要经过2次LLM和1次tool调用，会影响回复时间，如果想简化该ReAct的流程，可以直接用意图识别+工具调用的方式，由工具直接输出最终结果，不用反馈至LLM做总结回答，缺点就是没那么完美。

LangChain提供一些入门工具包，可以参考[4]。

通过前面，我们能发现Action和Action Input是调用的工具名称和该工具的输入，如果我们想对临时性上传文件进行对话，也直接加入tools就行了？实际上，没那么好操作，因为Action Input是LLM基于query解析得到的工具输入，而“对临时性上传文件进行对话”功能需要文件位置或文件内容，为此，其实可以按我下图绘制的，单独把 “对临时性上传文件进行对话”功能抽离出来。

具体前后端配合流程如下：

1、用户上传文件后，前端先创建一个临时知识库，然后将文件上传到该知识库，同时初始化文件对话轮数为0，并记录该临时知识库的变量名称；

2、后续每轮对话，都把知识库名称传入Agent Chat中，按上图流程走，每轮对话后，都更新下文件对话轮数；

3、当文件对话轮数 > 文件保留轮数或用户手动清空上下文，则前端删除该临时知识库，并清空该临时知识库的变量名称。

参考资料

[1] ReAct: Synergizing Reasoning and Acting in Language Models，官方介绍：https://react-lm.github.io/

[2] LangChain干货（1）：AgentExecutor究竟是怎样驱动模型和工具完成任务的？- 黄佳的文章 - 知乎https://zhuanlan.zhihu.com/p/661244337

[3] ReAct，Langchain文档：https://python.langchain.com/docs/modules/agents/agent_types/react/

[4] Agent 工具箱，Langchain文档：https://python.langchain.com/docs/integrations/toolkits/

[5]《LLM+搜索改写》10篇论文一览 - 情迷搜广推的文章 - 知乎：https://zhuanlan.zhihu.com/p/672357196

[6] MultiQueryRetriever，Langchain文档：https://python.langchain.com/docs/modules/data_connection/retrievers/MultiQueryRetriever/

[7] HypotheticalDocumentEmbedder，Langchain文档：https://github.com/langchain-ai/langchain/blob/master/cookbook/hypothetical_document_embeddings.ipynb