General Introduction
OmniParse is a powerful data parsing and optimization platform designed to transform any unstructured data into structured, actionable data optimized for GenAI (Generative Artificial Intelligence) frameworks. Whether you are working with documents, tables, images, videos, audio files or web content, OmniParse makes your data clean, structured and ready for AI applications such as RAG (Retrieval Augmented Generation) and fine-tuning.
Function List
- Fully localized, no external API required
- For T4 GPUs
- Supports about 20 file types
- Convert documents, multimedia and web pages into high-quality structured Markdown
- Table extraction, image extraction/subtitling, audio/video transcription, web crawling
- Easy Deployment with Docker and Skypilot
- Friendly Colab environment
- Interactive UI powered by Gradio
Using Help
Installation process
- clone warehouse::
git clone https://github.com/adithya-s-k/omniparse cd omniparse
- Creating a Virtual Environment::
conda create -n omniparse-venv python=3.10 conda activate omniparse-venv
- Installation of dependencies::
poetry install # or pip install -e . # or pip install -r pyproject.toml
Using Docker
- Pulling OmniParse API images from Docker Hub::
docker pull savatar101/omniparse:0.1
- Run the Docker container, exposing port 8000::
# If using GPU docker run --gpus all -p 8000:8000 savatar101/omniparse:0.1 # otherwise docker run -p 8000:8000 savatar101/omniparse:0.1
Operations Server
- Start the server::
python server.py --host 0.0.0.0 --port 8000 --documents --media --web
--documents
: Load all the models that help parse and ingest documents (e.g., the Surya OCR family of models and Florence-2).--media
: Load Whisper models to transcribe audio and video files.--web
: Setting up the Selenium crawler.
Supported Data Types
- (computer) file::
.doc
,.docx
,.pdf
,.ppt
,.pptx
- imagery::
.png
,.jpg
,.jpeg
,.tiff
,.bmp
,.heic
- video::
.mp4
,.mkv
,.avi
,.mov
- sound frequency::
.mp3
,.wav
,.aac
- web page: dynamic web pages.
http://.com
usage example
- document resolution::
python server.py --host 0.0.0.0 --port 8000 --documents
This loads all document parsing models ready to process data of the document type.
- multimedia analysis::
python server.py --host 0.0.0.0 --port 8000 --media
This loads the Whisper model, ready to process audio and video files.
- web crawler::
python server.py --host 0.0.0.0 --port 8000 --web
This will set up the Selenium crawler, ready to process web content.