AI Personal Learning
and practical guidance
Resource Recommendation 1

Simba: Knowledge management system for organizing documents, seamlessly integrated into any RAG system

General Introduction

Simba is a portable Knowledge Management System (KMS) designed to integrate seamlessly with any Retrieval Augmented Generation (RAG) system. Created by GitHub user GitHamza0206, the project provides an efficient knowledge management solution for a variety of application scenarios.Simba is designed with the goal of simplifying the knowledge management process and improving the accuracy and efficiency of information retrieval and generation. By integrating with the RAG system, Simba is able to provide powerful support in handling complex data and generating content.

Simba: Knowledge management system for organizing documents, seamlessly integrated into any RAG system-1


 

Simba: Knowledge management system for organizing documents, seamlessly integrated into any RAG system-1

 

Function List

  • knowledge management: Provide comprehensive knowledge management features that support storage, categorization and retrieval of knowledge.
  • RAG Systems Integration: Seamless integration with the retrieval enhancement generation system to improve the accuracy of information generation.
  • portability: Designed as a portable system that is easy to deploy and use.
  • open source project: As an open source project, users can freely access the source code and customize it.
  • Efficient retrieval: Optimized information retrieval to quickly find the knowledge you need.
  • user-friendly interface: Provides an intuitive user interface to simplify the operation process.

 

Using Help

Installation process

  1. clone warehouse: First, clone the Simba project's GitHub repository using the Git command.
   git clone https://github.com/GitHamza0206/simba.git
  1. Installation of dependencies: Go to the project directory and install the required dependency packages.
   cd simba

local development

  1. back-end configuration::
    • Go to the back-end catalog:
     cd backend
    
    • Ensure that Redis is installed in your operating system:
     redis-server
    
    • Setting environment variables:
     cp .env.example .env
    

    Then edit the .env file and fill in your values:

     OPENAI_API_KEY=""
    LANGCHAIN_TRACING_V2= #(optional - for langsmith tracing)
    LANGCHAIN_API_KEY="" #(optional - for langsmith tracing)
    REDIS_HOST=redis
    CELERY_BROKER_URL=redis://redis:6379/0
    CELERY_RESULT_BACKEND=redis://redis:6379/1
    
    • Install the dependencies:
     poetry install
    poetry shell
    

    Or on Mac/Linux:

     source .venv/bin/activate
    

    On Windows:

     .venv\Scripts\activate
    
    • Run the back-end service:
     python main.py
    

    Or use auto-reloading:

     uvicorn main:app --reload
    

    Then navigate tohttp://0.0.0.0:8000/docsAccess to the Swagger UI (optional).

    • Run the parser using Celery:
     celery -A tasks.parsing_tasks worker --loglevel=info
    
    • Modify as necessaryconfig.yamlDocumentation:
     project.
    name: "Simba"
    version: "1.0.0"
    api_version: "/api/v1"
    base_dir: null # Will be set programmatically.
    base_dir: null # Will be set programmatically
    markdown_dir: "markdown"
    faiss_index_dir: "vector_stores/faiss_index"
    vector_store_dir: "vector_stores"
    vector_store_dir: "vector_stores" llm.
    provider: "openai" #or ollama (vllm coming soon)
    model_name: "gpt-4o" #or ollama model name
    temperature: 0.0
    max_tokens: null
    streaming: true
    additional_params: {}
    embedding.
    provider: "huggingface" #or openai
    model_name: "BAAI/bge-base-en-v1.5" #or any HF model name
    device: "cpu" # mps,cuda,cpu
    additional_params: {}
    vector_store: "cpu" # mps,cuda,cpu additional_params: {}
    collection_name: "migi_store"
    collection_name: "migi_collection"
    additional_params: {}
    vector_store: provider: "faiss" collection_name: "migi_collection
    chunk_size: 512
    chunk_overlap: 200
    retrieval:
    k: 5 #number of chunks to retrieve
    features : k: 5 #number of chunks to retrieve
    enable_parsers: true # Set to false to disable parsing
    enable_parsers: true # Set to false to disable parsing
    broker_url: ${CELERY_BROKER_URL:-redis://redis:6379/0}
    result_backend: ${CELERY_RESULT_BACKEND:-redis://redis:6379/1}
    
  2. Front-end settings::
    • Make sure it's in the Simba root directory: bash
      cd frontend
    • Install the dependencies: bash
      npm install
    • Run the front-end service: bash
      npm run dev
      Then navigate tohttp://localhost:5173Access the front-end interface.

Booting with Docker (recommended)

  1. Navigate to the Simba root directory:
   export OPENAI_API_KEY="" #(optional)
docker-compose up --build

Project structure

simba/
├── backend/ # Core Processing Engine
│ ├── api/ # FastAPI endpoints
│ ├── services/ # Document Processing Logic
│ ├── tasks/ # Celery task definitions
│ └── models/ # Pydantic data model
├── frontend/ # React-based UI
│ ├── public/ # Static resources
│ └── src/ # React components
├── docker-compose.yml # Development Environment
└── docker-compose.prod.yml # Production Environment Settings

configure

config.yamlfile is used to configure the back-end application. You can change the following:

  • embedding model
  • vector storage
  • chunking
  • look up
  • functionality
  • resolver

For more information, please navigate tobackend/README.mdThe

Content 2
May not be reproduced without permission:Chief AI Sharing Circle " Simba: Knowledge management system for organizing documents, seamlessly integrated into any RAG system

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish