AI Personal Learning
and practical guidance
讯飞绘镜

SQLite-Utils-Ask: Lets users query SQLite databases and CSV/JSON files for query data

General Introduction

SQLite-Utils-Ask is a powerful tool designed to help users perform question-and-answer data queries on SQLite databases and CSV/JSON files with the aid of LLM (Large Language Model). The tool is capable of automatically generating appropriate SQL queries based on the user's questions and executing the queries to return results, greatly simplifying the process of data analysis and processing.

 

Function List

  • natural language questioning: Ask questions about the data in the database through natural language, and the tool will automatically generate the corresponding SQL queries.
  • Database compatibility: Supports SQLite database for easy data management and querying.
  • Handling CSV/JSON files: Supports direct querying of CSV, TSV or JSON files.
  • Multi-document search: Supports federated queries for multiple documents.
  • command-line tool: Provides an easy-to-use command line interface that allows users to quickly execute queries.
  • Plug-in Support: Can be integrated with tools such as sqlite-utils to extend functionality and application scenarios.

Asking questions about SQLite databases and CSV/JSON files in Terminal

I'm working on a CLI tool for my sqlite-utilsBuilt a new plugin that lets you ask human language questions directly to SQLite databases and CSV/JSON files on your computer.


Its name is sqlite-utils-ask. install it as follows:

sqlite-utils 安装 sqlite-utils-ask

It gets the API key from the environment variableOPENAI_API_KEY, or you can install LLM and use thellm keys set openaiStore the key in a configuration file.

Then you can use it like this:

curl -O https://datasette.io/content.db
sqlite-utils 询问 content.db “ 2024 年 sqlite-utils pypi 下载量是多少?”

This command will extract the SQL schema of the supplied database file, send it through LLM with your question, return the SQL query and try to run it to produce results.

If all goes well, it will give the following answer:

SELECT SUM(downloads)
FROM stats
WHERE package = 'sqlite-utils' AND date >= '2024-01-01' AND date < '2025-01-01';
[
{
"SUM(downloads)": 4300221
}
]

If the SQL query execution fails (due to some syntax error), it passes that error back to the model for correction and retries up to three times before giving up.

increase-v/--verboseto see the exact tips for its use:

System prompt:
You will be given a SQLite schema followed by a question. Generate a single SQL
query to answer that question. Return that query in a ```sql ... ```
fenced code block.
Example: How many repos are there?
Answer:
```sql
select count(*) from repos

Prompt.
...
CREATE TABLE [stats] (
[package] TEXT,
[date] TEXT,
[downloads] INTEGER,
PRIMARY KEY ([package], [date])
);
...
how many sqlite-utils pypi downloads in 2024?

我已将上述内容截断为仅包含相关表 - 它实际上包含了该数据库中每个表的完整模式。
默认情况下,该工具只会将该数据库架构和您的问题发送给 LLM。如果您添加该`-e/--examples`选项,它还将为该架构中的每个文本列包含五个公共值,平均长度小于 32 个字符。这有时可以帮助获得更好的结果,例如,为`state`列发送值“CA”和“FL”和“TX”可以提示模型应该在查询中使用州缩写而不是全名。
#### 询问 CSV 和 JSON 数据的问题
核心`sqlite-utils`CLI 通常直接针对 SQLite 文件运行,但三年前我添加了使用[sqlite-utils memory](https://simonwillison.net/2021/Jun/19/sqlite-utils-memory/)命令直接针对 CSV 和 JSON 文件运行 SQL 查询的功能。其工作原理是在执行 SQL 查询之前将数据加载到内存 SQLite 数据库中。
我决定重用该机制来直接针对 CSV 和 JSON 数据启用 LLM 提示。
该`sqlite-utils ask-files`命令如下所示:
```shell
sqlite-utils ask-files transaction.csv “按年计算的总销售额”

This command accepts one or more files, which you can supply in a mix of CSV, TSV and JSON formats. Each supplied file will be imported into a different table, allowing the model to construct join queries if necessary.

Description of the realization

The core implementation of the plugin is approximately250 lines of Python codeUsesqlite-utils register_commands()Plugin hooks to addaskcap (a poem)ask-filesCommand.

It adds LLM as a dependency and utilizes LLM'sPython APIto abstract the details of the dialog with the model. This means that thesqlite-utils-askAny model supported by LLM or its plugins can be used - if you want to pass the Claude 3.5 Sonnet runs the prompt and you can do the following:

sqlite-utils 安装 llm-claude-3
sqlite-utils 询问 content.db “计算新闻表中的行数” -m claude-3.5-sonnet

The plugin initially defaults to gpt-4o-mini to take advantage of the model's automatic hint caching: if you run multiple questions against the same schema, you'll end up sending the same long hint prefix multiple times, and OpenAI's hint caching should automatically kick in and provide a 50% discount for those input tokens.

Then I crunched the actual numbers and found thisgpt-4o-miniIt's cheap enough that even without caching 4,000 token hints (which is a pretty big SQL schema), the cost should be less than a tenth of a cent. So these cache savings aren't even worth mentioning!

May not be reproduced without permission:Chief AI Sharing Circle " SQLite-Utils-Ask: Lets users query SQLite databases and CSV/JSON files for query data
en_USEnglish