AI Personal Learning
and practical guidance

OmniParse: extract any unstructured data from documents/multimedia and parse it into structured data

General Introduction

OmniParse is a powerful data parsing and optimization platform designed to transform any unstructured data into structured, actionable data optimized for GenAI (Generative Artificial Intelligence) frameworks. Whether you are working with documents, tables, images, videos, audio files or web content, OmniParse makes your data clean, structured and ready for AI applications such as RAG (Retrieval Augmented Generation) and fine-tuning.

OmniParse: extract any unstructured data from documents/multimedia and parse it into structured data


 

OmniParse: extract any unstructured data from documents/multimedia and parse it into structured data
Open source demo address: https://colab.research.google.com/github/adithya-s-k/omniparse/blob/main/examples/OmniParse_GoogleColab.ipynb

 

Function List

  • Fully localized, no external API required
  • For T4 GPUs
  • Supports about 20 file types
  • Convert documents, multimedia and web pages into high-quality structured Markdown
  • Table extraction, image extraction/subtitling, audio/video transcription, web crawling
  • Easy Deployment with Docker and Skypilot
  • Friendly Colab environment
  • Interactive UI powered by Gradio

Using Help

Installation process

  1. clone warehouse::
    git clone https://github.com/adithya-s-k/omniparse
    cd omniparse
    
  2. Creating a Virtual Environment::
    conda create -n omniparse-venv python=3.10
    conda activate omniparse-venv
    
  3. Installation of dependencies::
    poetry install
    # or
    pip install -e .
    # or
    pip install -r pyproject.toml
    

Using Docker

  1. Pulling OmniParse API images from Docker Hub::
    docker pull savatar101/omniparse:0.1
    
  2. Run the Docker container, exposing port 8000::
    # If using a GPU
    docker run --gpus all -p 8000:8000 savatar101/omniparse:0.1
    # Otherwise
    docker run -p 8000:8000 savatar101/omniparse:0.1
    

Operations Server

  1. Start the server::
    python server.py --host 0.0.0.0 --port 8000 --documents --media --web
    
    • --documents: Load all the models that help parse and ingest documents (e.g., the Surya OCR family of models and Florence-2).
    • --media: Load Whisper models to transcribe audio and video files.
    • --web: Setting up the Selenium crawler.

Supported Data Types

  • (computer) file::.doc.docx.pdf.ppt.pptx
  • imagery::.png.jpg.jpeg.tiff.bmp.heic
  • video::.mp4.mkv.avi.mov
  • sound frequency::.mp3.wav.aac
  • web page: dynamic web pages.http://.com

usage example

  1. document resolution::
    python server.py --host 0.0.0.0 --port 8000 --documents
    

    This loads all document parsing models ready to process data of the document type.

  2. multimedia analysis::
    python server.py --host 0.0.0.0 --port 8000 --media
    

    This loads the Whisper model, ready to process audio and video files.

  3. web crawler::
    python server.py --host 0.0.0.0 --port 8000 --web
    

    This will set up the Selenium crawler, ready to process web content.

May not be reproduced without permission:Chief AI Sharing Circle " OmniParse: extract any unstructured data from documents/multimedia and parse it into structured data

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish