AI Personal Learning
and practical guidance
CyberKnife Drawing Mirror

OpenSearch-SQL: An Open Source Tool for Turning Natural Language into SQL Queries

General Introduction

OpenSearch-SQL is an open source project that is a powerful Text-to-SQL tool that transforms users' natural language descriptions into SQL query statements, helping people unfamiliar with databases to access data easily. The project is developed by the OpenSearch-AI team and is free and open based on the Apache 2.0 license.In August 2024, it won the first place in the BIRD benchmarks, with an accuracy of 69.3% for the validation set and 72.28% for the test set.OpenSearch-SQL can run without additional training and supports models such as GPT OpenSearch-SQL runs without additional training and supports models such as GPT, DeepSeek, etc. It is suitable for data analysis and database querying.

OpenSearch-SQL: An Open Source Tool for Converting Natural Language to SQL Queries-1


 

Function List

  • Convert natural language questions to SQL queries, such as "What is the tallest building" to SELECT building_name FROM buildings ORDER BY height DESC LIMIT 1The
  • CoT (Chain of Thought) enhancement methodology that supports self-learning to improve query generation accuracy.
  • Provides a SQL-Like intermediate language to optimize complex SQL generation.
  • Includes input-output alignment features to reduce errors in model generation (the illusion problem).
  • It supports five modules: preprocessing, extraction, generation, optimization and alignment, covering the whole query process.
  • Open source and free of charge, users can modify the code as needed or integrate it into their own projects.

 

Using Help

OpenSearch-SQL is a command line tool that needs to be installed and configured to work. Below are detailed steps to help you get started from scratch.

Installation process

  1. Preparing the Python Environment
    Make sure you have Python 3.8 or above installed on your computer. You can do this in the terminal by typing python --version Check. If not, download and install it from the official Python website (https://www.python.org/).
  2. Download project files
    Open your browser and go to https://github.com/OpenSearch-AI/OpenSearch-SQL. Click the "Code" button in the upper right corner and select "Download ZIP " and select "Download ZIP" to download the source code, or use the Git command to clone it:
git clone https://github.com/OpenSearch-AI/OpenSearch-SQL.git

Download and unzip it locally, for example C:\OpenSearch-SQL maybe /home/user/OpenSearch-SQLThe

  1. Installation of dependencies
    Go to the project folder, open a terminal and run the following command to install the required libraries:
pip install -r requirements.txt

This will install the Python packages needed to run the project, such as libraries for processing data and calling models.

Data preprocessing

OpenSearch-SQL requires few-shot examples to improve query accuracy. Data can be generated using the official script provided.

  1. Prepare data
    The project provides a sample file bird_dev.jsonLocated in Bird/bird_dev.jsonIt is based on the DAIL-SQL methodology. It is generated based on the DAIL-SQL methodology and contains few-shot query examples. If you have your own data, you can replace this file.
  2. Running preprocessing scripts
    Run it in the project root directory:
sh run/run_preprocess.sh

This script processes the few-shot data, table structures, and other information. When it's done, the terminal displays the output for each directory. If you're a Windows user, you can run it with Git Bash or WSL, or execute the commands in the script manually.

Run the main program

  1. triggering program
    Run it in the project root directory:
sh run/run_main.sh

This will call the src/runner/database_manager.py file to start processing the query. The program path is set in the script.

  1. Check Output
    After the main program runs, it generates SQL query results according to the configuration. The output file path is in the src/runner/database_manager.py (used form a nominal expression) _set_paths defined in the function, which can be adjusted as needed.

Featured Function Operation

  • Natural Language to SQL
    Enter a question like "Which city has the highest sales?" The program will generate it:
SELECT city FROM sales ORDER BY amount DESC LIMIT 1

You can find more information on the questions.json Add your own questions to the file and run it to see the results.

  • CoT Enhancement
    The project provides examples in Query-CoT-SQL format. For example:
  • User input: "What is the average age?"
  • CoT process: first find the age column, then calculate the mean.
  • Output:SELECT AVG(age) FROM usersThe
  • alignment function
    If the results are generated incorrectly, the program will automatically correct them. For example, if you enter "List all student grades" but the database has multiple tables, it will ensure that the JOIN operation is correct:
SELECT s.name, sc.score FROM students s JOIN scores sc ON s.id = sc.student_id

caveat

  • If you need to test the BIRD dataset, you can do so directly with the Bird/fewshot/questions.json Documentation.
  • The program supports multiple models and may require an API key for the default configuration. If using GPT or DeepSeek, the key needs to be set in the code.

With these steps, you can easily turn natural language into SQL queries to handle a variety of data analysis tasks.

 

application scenario

  1. data analysis
    Data analysts can use it to turn questions into SQL and quickly count sales data or user behavior.
  2. Education and training
    Students can use it to learn SQL by entering questions and comparing the query statements generated.
  3. Automated Reports
    Organizations can use it to automatically generate report SQL and reduce manual writing time.

 

QA

  1. Does OpenSearch-SQL require an internet connection?
    If using a local model, no networking is required. However, networking and API keys are required when using an online model such as GPT.
  2. What databases are supported?
    It targets any database that supports SQL, as long as the table structure and data are provided.
  3. BIRD What does it mean to be number one on the list?
    It shows that it has the highest accuracy in Text-to-SQL tasks and can handle complex queries, with an August 2024 score of 72.28% for the test set.
May not be reproduced without permission:Chief AI Sharing Circle " OpenSearch-SQL: An Open Source Tool for Turning Natural Language into SQL Queries

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish