Simba: Knowledge management system for organizing documents, seamlessly integrated into any RAG system

Latest AI Resources9mos agorelease AI Sharing Circle

27.5K 00

General Introduction

Simba is a portable Knowledge Management System (KMS) designed to integrate seamlessly with any Retrieval Augmented Generation (RAG) system. Created by GitHub user GitHamza0206, the project provides an efficient knowledge management solution for a variety of application scenarios.Simba is designed with the goal of simplifying the knowledge management process and improving the accuracy and efficiency of information retrieval and generation. By integrating with the RAG system, Simba is able to provide powerful support in handling complex data and generating content.

Function List

knowledge management: Provide comprehensive knowledge management features that support storage, categorization and retrieval of knowledge.
RAG Systems Integration: Seamless integration with the retrieval enhancement generation system to improve the accuracy of information generation.
portability: Designed as a portable system that is easy to deploy and use.
open source project: As an open source project, users can freely access the source code and customize it.
Efficient retrieval: Optimized information retrieval to quickly find the knowledge you need.
user-friendly interface: Provides an intuitive user interface to simplify the operation process.

Using Help

Installation process

clone warehouse: First, clone the Simba project's GitHub repository using the Git command.

   git clone https://github.com/GitHamza0206/simba.git

Installation of dependencies: Go to the project directory and install the required dependency packages.

   cd simba

local development

back-end configuration::

Go to the back-end catalog:

 cd backend

Ensure that Redis is installed in your operating system:

 redis-server

Setting environment variables:

 cp .env.example .env

Then edit the .env file and fill in your values:

 OPENAI_API_KEY=""
LANGCHAIN_TRACING_V2= #(optional - for langsmith tracing)
LANGCHAIN_API_KEY="" #(optional - for langsmith tracing)
REDIS_HOST=redis
CELERY_BROKER_URL=redis://redis:6379/0
CELERY_RESULT_BACKEND=redis://redis:6379/1

Install the dependencies:

 poetry install
poetry shell

Or on Mac/Linux:

 source .venv/bin/activate

On Windows:

 .venv\Scripts\activate

Run the back-end service:

 python main.py

Or use auto-reloading:

 uvicorn main:app --reload

Then navigate tohttp://0.0.0.0:8000/docsAccess to the Swagger UI (optional).

Run the parser using Celery:

 celery -A tasks.parsing_tasks worker --loglevel=info

Modify as necessaryconfig.yamlDocumentation:

 project:
name: "Simba"
version: "1.0.0"
api_version: "/api/v1"
paths:
base_dir: null  # Will be set programmatically
markdown_dir: "markdown"
faiss_index_dir: "vector_stores/faiss_index"
vector_store_dir: "vector_stores"
llm:
provider: "openai" #or ollama (vllm coming soon)
model_name: "gpt-4o" #or ollama model name
temperature: 0.0
max_tokens: null
streaming: true
additional_params: {}
embedding:
provider: "huggingface" #or openai
model_name: "BAAI/bge-base-en-v1.5" #or any HF model name
device: "cpu"  # mps,cuda,cpu
additional_params: {}
vector_store:
provider: "faiss"
collection_name: "migi_collection"
additional_params: {}
chunking:
chunk_size: 512
chunk_overlap: 200
retrieval:
k: 5 #number of chunks to retrieve
features:
enable_parsers: true  # Set to false to disable parsing
celery:
broker_url: ${CELERY_BROKER_URL:-redis://redis:6379/0}
result_backend: ${CELERY_RESULT_BACKEND:-redis://redis:6379/1}

Front-end settings::
- Make sure it's in the Simba root directory: bash cd frontend
- Install the dependencies: bash npm install
- Run the front-end service: bash npm run dev Then navigate tohttp://localhost:5173Access the front-end interface.

Booting with Docker (recommended)

Navigate to the Simba root directory:

   export OPENAI_API_KEY="" #(optional)
docker-compose up --build

Project structure

simba/
├── backend/              # 核心处理引擎
│   ├── api/              # FastAPI端点
│   ├── services/         # 文档处理逻辑
│   ├── tasks/            # Celery任务定义
│   └── models/           # Pydantic数据模型
├── frontend/             # 基于React的UI
│   ├── public/           # 静态资源
│   └── src/              # React组件
├── docker-compose.yml    # 开发环境
└── docker-compose.prod.yml # 生产环境设置