Breaking the Tool Calling Bottleneck: The CoTools Framework Enables Large Language Models to Efficiently Utilize a Massive Number of Tools

AI Knowledge Base4mos agoupdate AI Sharing Circle

1.1K 00

introductory

In recent years, Large Language Models (LLMs) have made impressive progress in the field of Artificial Intelligence, and their powerful language comprehension and generation capabilities have led to a wide range of applications in several domains. However, LLMs still face many challenges when dealing with complex tasks that require the invocation of external tools. For example, when a user asks "What is the weather at my destination tomorrow?" LLMs need to be able to call the weather API to get accurate information.

To address this problem, researchers have proposed a variety of tool learning methods aimed at enabling LLMs to utilize external tools more effectively. However, existing methods often suffer from the following limitations:

Fine-tuning based approach: The model needs to be fine-tuned so that it can only use tools that have been seen in the training data. While this approach improves the accuracy of tool calls, it limits the model's ability to generalize to unseen tools.
Context-based learning (ICL) approach: There is no need to fine-tune the model, and tools can be invoked by adding a tool demo to the prompt. However, the reasoning efficiency of this approach decreases significantly when confronted with a large number of tools.

CoTools: a more efficient framework for tool learning

To overcome these challenges, a team of researchers at Soochow University has developed a new technology called CoTools of a new tool learning framework. The main goal of the framework is to enable efficient invocation of a large number of unseen tools without sacrificing model generalization capabilities.

How CoTools works

The core idea of CoTools is to utilize the powerful semantic representation capability of the frozen language model to dynamically determine whether it is necessary to invoke a tool during the reasoning process and select the most appropriate tool. Its main process is as follows:

tool judgment::
- importation: The user's question and the current answer snippet generated by the model.
- course of events: CoTools analyzes the semantic information of the current answer fragment and determines whether a call to the tool is needed to supplement the information. For example, when a user asks about the weather, the model recognizes the need to call the weather API.
- exports: A judgment result that determines whether to trigger a tool call.
tool search::
- importation: User questions and judgmental outcomes.
- course of events: If a tool needs to be invoked, CoTools retrieves the most appropriate tool from the tool pool based on the problem description. The tool pool can contain a large number of unseen tools, and CoTools evaluates their relevance to the problem by analyzing the tool's description information.
- exports: Selected tools.
Tool Call::
- importation: Issues for selected tools and users.
- course of events: CoTools generates the parameters of the tool using contextual learning prompts, executes the tool and obtains the results.
- exports: The results returned by the tool are added to the answer.
Figure 1: Flowchart of the CoTools method.

The CoTools Advantage

Efficient utilization of unseen tools: CoTools does not rely on pre-training of tools, but dynamically selects tools by analyzing their descriptions. This gives it the flexibility to invoke a large number of unseen tools.
Maintaining the original capabilities of the model: Since the language model is frozen, CoTools does not affect the model's ability to generalize and reason.
Enhancing Model Interpretability: By analyzing the key dimensions of model output, CoTools can help researchers better understand the process of tool selection.

Results

To verify the effectiveness of CoTools, the research team conducted several experiments, including:

numerical reasoning task: Using the GSM8K-XL and FuncQA datasets, CoTools performs well on both single-hop and multi-hop problems.
quizzing tasks: Using the KAMEL and SimpleToolQuestions (STQuestions) datasets, CoTools performs well in a large number of tool scenarios and shows good generalization on unseen tools.

Below are the results of CoTools compared to other methods on the KAMEL and STQuestions datasets:

methodologies	KAMEL SUP	KAMEL SYN	STQuestions Seen	STQuestions Unseen
ToolkenGPT LLaMA	93.4	20.6	23.8	0.0
CoTools (Ours) LLaMA	93.8	43.6	35.1	10.4

reach a verdict

CoTools provides a more efficient and flexible tool learning framework that enables efficient invocation of a large number of unseen tools without sacrificing model generalization capabilities. The approach opens up new possibilities for the application of large language models in more complex real-world scenarios.

future outlook

Despite the encouraging results of CoTools, the researchers also noted that current research on LLM tool learning is still in its early stages. In the future, the CoTools team plans to:

Explore how to handle tools that contain multiple return values.
Test the performance of CoTools on larger, more complex real-world tool sets.

Prompt Example

The core idea of CoTools is to utilize the powerful semantic understanding of the frozen language model to dynamically determine whether a tool needs to be invoked during the reasoning process and select the most appropriate tool to perform the task. The following will introduce the workflow of CoTools in detail, and provide detailed Prompt examples to help readers better understand the input and output of each step.

#### 1. 初始输入与预处理
- **输入**：
- **用户问题**：例如，“明天我目的地的天气如何？”
- **上下文信息**（如果有）：例如，之前对话的上下文或用户的位置信息。
- **处理**：
- **问题解析**：将用户问题进行预处理，包括分词、去除停用词等。
- **上下文整合**：将上下文信息与用户问题整合，形成完整的输入序列。
- **输出**：
- **预处理后的输入序列**：例如，“[CLS] 明天 我 目的地 的 天气 如何？ [SEP] 北京”
#### 2. 工具判断
- **输入**：
- **预处理后的输入序列**。
- **处理**：
- **语义分析**：CoTools 使用冻结语言模型生成输入序列的隐藏状态，并利用工具判断器（Tool Judge）分析这些隐藏状态，以确定是否需要调用工具。
- **工具判断器**：
- **计算公式**（简化版）：
```
Score_I = 工具判断器(隐藏状态)
```
- **决策逻辑**：
- 如果 $Score_I$ 超过预设阈值（通常为 0.5），则触发工具调用。
- 否则，继续生成答案文本。
- **输出**：
- **判断结果**：
- **调用工具**：例如，“需要调用天气 API”。
- **无需调用工具**：继续生成答案。
#### 3. 工具检索
- **输入**：
- **判断结果**：例如，“需要调用天气 API”。
- **用户问题**：例如，“明天我目的地的天气如何？”
- **处理**：
- **工具池检索**：CoTools 使用工具检索器（Tool Retriever）从工具池中检索最合适的工具。
- **工具检索器**：
- **查询向量计算**：将用户问题转换为向量表示。
- **工具向量计算**：将工具池中的每个工具描述转换为向量表示。
- **相似度计算**：计算查询向量与每个工具向量的相似度得分，得分最高的工具即为最合适的工具。
- **工具池示例**：
```
1. 天气 API：提供指定地点的天气信息。
2. 地图 API：提供指定地点的地图信息。
3. 翻译工具：提供文本翻译服务。
```
- **输出**：
- **选定的工具**：例如，“天气 API”。
#### 4. 工具调用
- **输入**：
- **选定的工具**：例如，“天气 API”。
- **用户问题**：例如，“明天我目的地的天气如何？”
- **处理**：
- **参数生成**：使用上下文学习提示（ICL Prompt）生成工具调用所需的参数。
- **示例提示**：
```
[用户问题]
[当前答案片段（如果有）]
请调用以下工具并生成调用参数：
工具名称：天气 API
工具描述：提供指定地点的天气信息。
```
- **工具执行**：将生成的参数传递给工具，执行工具并获取返回结果。
- **输出**：
- **工具返回结果**：例如，“北京的天气情况：晴天，气温 25°C”。
#### 5. 答案生成
- **输入**：
- **工具返回结果**：例如，“北京的天气情况：晴天，气温 25°C”。
- **用户问题**：例如，“明天我目的地的天气如何？”
- **处理**：
- **答案整合**：将工具返回结果整合到答案中，生成最终回答。
- **输出**：
- **最终回答**：
```
明天北京的天气情况是晴天，气温 25°C。
```
### 详细的 Prompt 示例
以下是一个更详细的 Prompt 示例，展示了 CoTools 在每个流程节点的具体操作：
用户问题：明天我目的地的天气如何？
初始输入序列：
[CLS] 明天 我 目的地 的 天气 如何？ [SEP] 北京
工具判断：
输入：预处理后的输入序列
处理：
语义分析：生成隐藏状态并计算 Score_I
决策：Score_I > 0.5，触发工具调用
输出：需要调用天气 API
工具检索：
输入：用户问题和判断结果
处理：
查询向量计算：生成用户问题的向量表示
工具向量计算：生成工具池中每个工具的向量表示
相似度计算：计算相似度得分
输出：选定天气 API
工具调用：
输入：选定的工具和用户问题
处理：
参数生成：
[用户问题]
[当前答案片段（如果有）]
请调用以下工具并生成调用参数：
工具名称：天气 API
工具描述：提供指定地点的天气信息。
工具执行：调用天气 API 并获取返回结果
输出：北京的天气情况：晴天，气温 25°C
答案生成：
输入：工具返回结果和用户问题
处理：整合结果生成最终回答
输出：明天北京的天气情况是晴天，气温 25°C。
### 总结
CoTools 通过精细化的流程设计和强大的语义理解能力，实现了在复杂任务中对外部工具的高效调用。该框架不仅能够处理常见工具，还能灵活应对未见过的工具，为大型语言模型在现实场景中的应用提供了新的可能性。

With such prompts, CoTools can dynamically determine whether a tool needs to be invoked based on the user's question and the current answer snippet, and select the most appropriate tool to get the required information.

Original: https://arxiv.org/pdf/2503.16779