General Introduction
PandasAI is a Python-based open source platform designed to simplify the process of analyzing data through natural language processing techniques. It enables users to interact with databases (e.g. SQL, CSV, pandas, polars, mongodb, noSQL, etc.) in a conversational manner. The platform utilizes large-scale language models (e.g., GPT-3.5/4, Anthropic, VertexAI) and Retrieval Augmented Generation (RAG) technologies to make data analysis more intuitive and efficient for both technical and non-technical users.
Function List
- natural language query: Easy access to data analysis results through natural language questioning.
- data visualization: Generate charts and graphs to visualize data.
- Data Cleaning: Dealing with missing values to improve data quality.
- Feature Generation: Enhance the dataset by generating new features.
- Multiple Data Source Support: Connect to CSV, XLSX, PostgreSQL, MySQL, BigQuery and many other data sources.
- Multi-model support: Integration of GPT 3.5/4, Anthropic, VertexAI and other multi-language models.
Using Help
Installation process
- Installing Docker : Make sure you have Docker installed on your machine.
- clone warehouse : Run
git clone https://github.com/Sinaptik-AI/pandas-ai
The - Building platforms : Go to the project directory and run
docker-compose build
The - Launch platform : Run
docker-compose up
and then visithttp://localhost:3000
The
Using the PandasAI library
- installer ::
- Use pip:
pip install pandasai
- Use poetry:
poetry add pandasai
- Use pip:
- import library ::
import os
import pandas as pd
from pandasai import Agent
- Creating Data Frames ::
sales_by_country = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"revenue": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000]
})
- Configuring the API Key ::
os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"
- Create Agent and Query ::
agent = Agent(sales_by_country)
response = agent.chat('Which are the top 5 countries by sales?')
print(response)
- Generate Charts ::
agent.chat("Plot the histogram of countries showing for each one the gd. Use different colors for each bar")
Using the PandasAI platform
- Access platforms : Post-startup access
http://localhost:3000
The - Upload data : Upload CSV or Excel files through the interface.
- natural language query : Enter a question in the query box, e.g. "Which are the top 5 countries by sales?".
- View Results : The platform will return the results of the query and optionally generate the appropriate charts.
PandasAI is suitable for a variety of data analysis scenarios, whether for business analysis, academic research or personal projects. With natural language processing technology, users can easily get valuable information from data without writing complex code.