General Introduction
WikiChat is an experimental chatbot developed at Stanford University that aims to improve the factoring of large language models by retrieving data from Wikipedia. Large language models (such as ChatGPT and GPT-4) tend to make errors when dealing with up-to-date information or less popular topics.WikiChat ensures the accuracy of its responses by using Wikipedia and a seven-stage pipeline. The project supports multiple languages and is able to retrieve information from structured data such as tables, infoboxes, and lists.WikiChat also provides high-quality Wikipedia preprocessing scripts and uses state-of-the-art multi-language retrieval models BGE-M3 and Qdrant for scalable vector searches.
Function List
- Multi-language support: Retrieving information from Wikipedia in 10 different languages is supported by default.
- Improved information retrieval: Support for retrieving information from structured data such as tables, infoboxes and lists.
- High-quality Wikipedia preprocessing scripts: Using the state-of-the-art multilingual search model BGE-M3.
- Free Multilingual Wikipedia Search API: Provides a high-quality, free (but rate-limited) search API.
- Extended LLM Compatibility: Over 100 LLMs are supported through a unified interface.
- Optimized piping: Provides faster, more cost-effective plumbing options.
- LangChain Compatibility: Fully compatible with LangChain.
- Multi-user access deployment: Provide code to deploy simple front-end and back-end and connect to Azure Cosmos DB database to store the dialog.
Using Help
Installation process
- Installing dependencies::
git clone https://github.com/stanford-oval/WikiChat.git cd WikiChat conda env create --file conda_env.yaml conda activate wikichat python -m spacy download en_core_web_sm
- Installing Docker: Follow the official Docker documentation for installation.
- Configuring LLM::
- write data in a box (on a questionnaire or web form)
llm_config.yaml
The relevant fields in the file. - Create a file named
API_KEYS
file and set the required API key.
- write data in a box (on a questionnaire or web form)
- Configuration Information Retrieval::
- Use the default Wikipedia search API.
- Or download and host the Wikipedia index.
- Or build your own index.
- Running WikiChat::
inv demo --retriever-endpoint "http://0.0.0.0:/search"
Functional operation flow
- Multi-language support: WikiChat retrieves information from Wikipedia in 10 different languages by default, including English, Chinese, Spanish, Portuguese, Russian, German, French, Italian, Japanese and Farsi.
- information retrieval: Supports retrieval of information from structured data such as tables, infoboxes and lists, using the state-of-the-art multilingual retrieval model BGE-M3.
- Free Search API: Provides a high-quality, free, multi-language Wikipedia search API with support for over 180M vector embeddings.
- Extended LLM Compatibility: Supports over 100 LLMs through a unified interface, including OpenAI, Azure, Anthropic, Mistral, HuggingFace, Together.ai, and Groq of the model.
- Optimization of pipelines: Provide a faster, more cost-effective pipeline option to optimize performance by merging WikiChat's "Generate" and "Extract Statements" phases.
- LangChain Compatibility: Fully compatible with LangChain and supports seamless integration of multiple LLMs.
- Multi-user access deployment: Provide code to deploy simple front-end and back-end, and connect to an Azure Cosmos DB database to store conversations.