General Introduction
Ollama OCR is a powerful Optical Character Recognition (OCR) toolkit that utilizes state-of-the-art visual language models provided by the Ollama platform to extract text from images. The project is available both as a Python package and as a user-friendly Streamlit web application interface. It supports a wide range of vision models, including LLaVA 7B for real-time processing and the high-precision Llama 3.2 Vision model for complex documents.Ollama OCR is distinguished by its support for a wide range of output formats, including Markdown, plain text, JSON, etc., and by its batch processing capabilities. The tool is particularly suitable for developers and researchers who need to extract and structure text data from images.
Function List
- Support for multiple advanced visual language models (LLaVA 7B and Llama 3.2 Vision)
- Provide diverse output formats (Markdown, plain text, JSON, structured data, key-value pairs)
- Support batch image processing function, can process multiple images in parallel
- Built-in image pre-processing (resizing, normalization, etc.)
- Provide progress tracking and processing statistics
- Supports the user-friendly Streamlit web interface
- Supports drag-and-drop image uploading and real-time processing
- Provide download function for extracted text
- Integrated image preview and detailed information display
Using Help
1. Installation steps
- The Ollama platform needs to be installed first:
- Visit the official Ollama website to download the installation package for your system.
- Complete the basic installation of Ollama
- Install the required visual model:
ollama pull llama3.2-vision:11b
- Install the Ollama OCR package:
pip install ollama-ocr
2. Python package usage
2.1 Single Image Processing
from ollama_ocr import OCRProcessor
# Initialize the OCR processor
ocr = OCRProcessor(model_name='ollama3.2-vision:11b')
# Process a single image
result = ocr.process_image(
image_path="Image path.png",
format_type="markdown" # Optional formats: markdown, text, json, structured, key_value
)
print(result)
2.2 Batch Processing Images
# Initialize the OCR processor and set the number of parallel processes
ocr = OCRProcessor(model_name='llama3.2-vision:11b', max_workers=4)
# batch process images
batch_results = ocr.process_batch(
input_path="Image folder path",
format_type="markdown",
recursive=True, # search subdirectories
preprocess=True # enable image preprocessing
)
# View processing results
for file_path, text in batch_results['results'].items()::
print(f"\n file: {file_path}")
print(f "Extracted text: {text}")
# Viewing processing statistics
print(f "Total images: {batch_results['statistics']['total']}")
print(f "Successful processing: {batch_results['statistics']['successful']}")
print(f "Processing failed: {batch_results['statistics']['failed']}")
3. How to use the Streamlit web application
- Clone the code repository:
git clone https://github.com/imanoop7/Ollama-OCR.git
cd Ollama-OCR
- Install the dependencies:
pip install -r requirements.txt
- Launch the web application:
cd src/ollama_ocr
streamlit run app.py
4. Description of output formats
- Markdown formatting: retains text formatting, including headings and lists
- Plain text formatting: provides clean and concise text extraction
- JSON format: structured data format output
- Structured formats: tables and organized data
- Key-value pair format: extracting labeled information
5. Cautions
- The LLaVA model may occasionally produce incorrect output, and it is recommended that the Llama 3.2 Vision model be used for important scenarios
- Image preprocessing can improve recognition accuracy
- When batch processing, pay attention to the reasonable setting of the number of parallelism, to avoid excessive memory consumption
- It is recommended to enable progress tracking when processing a large number of images