AI Personal Learning
and practical guidance
豆包Marscode1

Ollama OCR: Extracting Text from Images Using Visual Models in Ollama

General Introduction

Ollama OCR is a powerful Optical Character Recognition (OCR) toolkit that utilizes state-of-the-art visual language models provided by the Ollama platform to extract text from images. The project is available both as a Python package and as a user-friendly Streamlit web application interface. It supports a wide range of vision models, including LLaVA 7B for real-time processing and the high-precision Llama 3.2 Vision model for complex documents.Ollama OCR is distinguished by its support for a wide range of output formats, including Markdown, plain text, JSON, etc., and by its batch processing capabilities. The tool is particularly suitable for developers and researchers who need to extract and structure text data from images.

Ollama OCR:使用Ollama中视觉模型提取图像中的文本-1


 

Function List

  • Support for multiple advanced visual language models (LLaVA 7B and Llama 3.2 Vision)
  • Provide diverse output formats (Markdown, plain text, JSON, structured data, key-value pairs)
  • Support batch image processing function, can process multiple images in parallel
  • Built-in image pre-processing (resizing, normalization, etc.)
  • Provide progress tracking and processing statistics
  • Supports the user-friendly Streamlit web interface
  • Supports drag-and-drop image uploading and real-time processing
  • Provide download function for extracted text
  • Integrated image preview and detailed information display

 

Using Help

1. Installation steps

  1. The Ollama platform needs to be installed first:
    • Visit the official Ollama website to download the installation package for your system.
    • Complete the basic installation of Ollama
  2. Install the required visual model:
ollama pull llama3.2-vision:11b
  1. Install the Ollama OCR package:
pip install ollama-ocr

2. Python package usage

2.1 Single Image Processing

from ollama_ocr import OCRProcessor
# 初始化OCR处理器
ocr = OCRProcessor(model_name='llama3.2-vision:11b')
# 处理单张图像
result = ocr.process_image(
image_path="图片路径.png",
format_type="markdown"  # 可选格式:markdown, text, json, structured, key_value
)
print(result)

2.2 Batch Processing Images

# 初始化OCR处理器,设置并行处理数
ocr = OCRProcessor(model_name='llama3.2-vision:11b', max_workers=4)
# 批量处理图像
batch_results = ocr.process_batch(
input_path="图片文件夹路径",
format_type="markdown",
recursive=True,  # 搜索子目录
preprocess=True  # 启用图像预处理
)
# 查看处理结果
for file_path, text in batch_results['results'].items():
print(f"\n文件: {file_path}")
print(f"提取的文本: {text}")
# 查看处理统计
print(f"总图像数: {batch_results['statistics']['total']}")
print(f"成功处理: {batch_results['statistics']['successful']}")
print(f"处理失败: {batch_results['statistics']['failed']}")

3. How to use the Streamlit web application

  1. Clone the code repository:
git clone https://github.com/imanoop7/Ollama-OCR.git
cd Ollama-OCR
  1. Install the dependencies:
pip install -r requirements.txt
  1. Launch the web application:
cd src/ollama_ocr
streamlit run app.py

4. Description of output formats

  • Markdown formatting: retains text formatting, including headings and lists
  • Plain text formatting: provides clean and concise text extraction
  • JSON format: structured data format output
  • Structured formats: tables and organized data
  • Key-value pair format: extracting labeled information

5. Cautions

  • The LLaVA model may occasionally produce incorrect output, and it is recommended that the Llama 3.2 Vision model be used for important scenarios
  • Image preprocessing can improve recognition accuracy
  • When batch processing, pay attention to the reasonable setting of the number of parallelism, to avoid excessive memory consumption
  • It is recommended to enable progress tracking when processing a large number of images
May not be reproduced without permission:Chief AI Sharing Circle " Ollama OCR: Extracting Text from Images Using Visual Models in Ollama
en_USEnglish