General Introduction
Denser Chat is a chatbot project developed and maintained by denser.ai to extract text and tables from PDF files and web pages with source code highlighting. The project supports building denser-retriever based chatbots and provides interactive Streamlit chatbot applications. Users can quickly deploy and use the chatbot to answer questions related to PDF and web content with simple installation and configuration.
Function List
- Extract text and tables from PDF files and web pages
- Building a chatbot based on denser-retriever
- Support for interactive Streamlit chatbot applications
- Provides source code highlighting
- Supports multiple file formats and URLs as data sources
- Starting Elasticsearch and Milvus Services with Docker Compose
- Use OpenAI or Claude API Provides Chat Functionality
Using Help
Installation process
- Cloning Warehouse:
git clone https://github.com/denser-org/denser-chat.git
- Go to the project directory and start the virtual environment (make sure Python version is 3.11):
cd denser-chat
python -m venv .venv
source .venv/bin/activate
- Install the required packages:
pip install -e .
Or use Poetry:
poetry install
Quick Start
- Before building the index, run Docker Compose to start the Elasticsearch and Milvus services:
cd denser_chat
docker compose up -d
- Building a chatbot index:
python build.py sources.txt output test_index
where the first parameter is the file used to build the chatbot, which can be a local PDF file, a URL PDF, or a URL. the second parameter is the output directory, and the third parameter is the index name.
- Start the local server to provide PDF services:
python -m http.server 8000
- Launch the Streamlit application:
cd denser_chat
streamlit run demo.py -- --index_name test_index
Usage Functions
- Extract text and tables: Upload a PDF file or enter a web page URL, and Denser Chat will automatically extract the text and table content from it.
- Source Code Highlighting: During the chat, Denser Chat highlights the relevant source code in the PDF file for easy viewing and understanding.
- interactive chat: By configuring OpenAI or Claude API keys, users can interact with chatbots to get accurate answers.
Detailed Operation Procedure
- Uploading files: Select and upload a PDF file in the application interface, or enter a web page URL.
- ask questions: Type questions into the chat window, such as "What is negative sampling within a batch?" or "What parts of the batch have stopping pins?" .
- View Results: Denser Chat will return answers with highlighting, making it easy for users to quickly locate relevant content.