AI Personal Learning
and practical guidance

Zerox: PDF, DOCX, image conversion to Markdown, visual modeling high-precision OCR

General Introduction

Zerox is an open source project designed to convert PDF, DOCX, images and other documents to Markdown format through visual modeling. The project is developed by getomni-ai team , provides a simple and efficient OCR (Optical Character Recognition) solution. zerox supports Node and Python two programming languages , the use of graphicsmagick and ghostscript for PDF to image processing . Users can quickly convert documents to Markdown format by providing the file path and OpenAI API key for a variety of documents with complex layouts, such as tables and charts.

Zerox: Convert PDFDOCX, Images to Markdown, Use Visual Models for Efficient OCR-1


 

Function List

  • Support PDF, DOCX, images and other file formats conversion
  • Provides support for both Node and Python programming languages
  • Efficient OCR Processing Using Visual Models
  • Automatically installs graphicsmagick and ghostscript for PDF-to-image processing.
  • Supports both file path and URL input
  • Provide a variety of optional parameters, such as concurrency processing, page orientation correction, error handling mode, etc.
  • Support for pre-processing and post-processing callback functions
  • Option to save conversion results to a specified directory

 

Using Help

Installation process

Node version

  1. Installing Node.js and npm
  2. Run command npm install zerox
  3. Make sure that graphicsmagick and ghostscript are installed on your system, if not, run the following command:
   sudo apt-get update
sudo apt-get install -y graphicsmagick ghostscript

Python version

  1. Install Python and pip
  2. Run command pip install zerox
  3. Make sure that graphicsmagick and ghostscript are installed on your system, if not, run the following command:
   sudo apt-get update
sudo apt-get install -y graphicsmagick ghostscript

Usage

Node version

  1. Import the zerox module:
   import { zerox } from "zerox".
  1. Use the file path for conversion:
   const result = await zerox({
filePath: "path/to/file.pdf",
openaiAPIKey: process.env.OPENAI_API_KEY, }); }
}).
  1. Use the URL for conversion:
   const result = await zerox({
filePath: "https://example.com/file.pdf",
openaiAPIKey: process.env.OPENAI_API_KEY, }); await zerox({ filePath: "", openaiAPIKey: process.env.
});

Python version

  1. Import the zerox module:
   from zerox import zerox
  1. Use the file path for conversion:
   result = zerox(
file_path="path/to/file.pdf",
openai_api_key="your_openai_api_key"
)
  1. Use the URL for conversion:
   result = zerox(
file_path="https://example.com/file.pdf",
openai_api_key="your_openai_api_key"
)

Main function operation flow

  1. file conversion: Provide the path or URL of the file, call the zerox function to convert it and return the text in Markdown format.
  2. concurrent processing: By setting theconcurrencyparameter to control the number of pages processed at the same time to improve processing efficiency.
  3. Page orientation correction: The page orientation correction feature is enabled by default to ensure that the converted text is oriented correctly.
  4. error handling mode: Optionally, errors can be ignored or thrown, by setting theerrorModeparameters are configured.
  5. Pre- and post-processing callbacks: Provides callback functions to perform custom actions before and after each page is processed.
  6. Save results: By setting theoutputDirparameter to save the conversion result to the specified directory.

sample code (computing)

Node version

import { zerox } from "zerox" ;
const result = await zerox({
filePath: "path/to/file.pdf", openaiAPIKey: process.env.OPENAI_API_KEY, process.
openaiAPIKey: process.env.OPENAI_API_KEY, cleanup: true, true, result = await zerox({ filePath: "path/to/file.pdf", openaiAPIKey: process.env.
openaiAPIKey: process.env.OPENAI_API_KEY, cleanup: true,
cleanup: true, concurrency: 10,
cleanup: true, concurrency: 10, correctOrientation: true,
errorMode: "IGNORE",
maintainFormat: false,
maxRetries: 1, maxTesseractWorkers: -1,
maxTesseractWorkers: -1, model: "gpt-4-o-mini",
model: "gpt-4o-mini", onPostProcess: asynchronized
onPostProcess: async ({ page, progressSummary }) => Promise,
onPreProcess: async ({ imagePath, pageNumber }) => Promise,
outputDir: "output",
pagesToConvertAsImages: -1,
}).

Python version

from zerox import zerox
result = zerox(
file_path="path/to/file.pdf",
openai_api_key="your_openai_api_key",
cleanup=True,
openai_api_key="your_openai_api_key", cleanup=True, concurrency=10,

correct_orientation=True, error_mode="IGNORE", maintain_format=False
maintain_format=False, max_retries=1,
max_retries=1,
max_tesseract_workers=-1, model="gpt-4-o-min
model="gpt-4o-mini",
on_post_process=lambda page, progress_summary: None,

output_dir="output",
pages_to_convert_as_images=-1,
)

 

We use libreoffice cap (a poem) graphicsmagick The document to image conversion is done using a combination of the following. For non-image/non-PDF files, we use libreoffice to convert the file to PDF and then to image.

[
"pdf", // Portable Document Format
"doc", // Microsoft Word 97-2003
"docx", // Microsoft Word 2007-2019
"odt", // OpenDocument Text
"ott", // OpenDocument Text Template
"rtf", // Rich Text Format
"txt", // Plain Text
"html", // HTML Document
"htm", // HTML Document (alternative extension)
"xml", // XML Document
"wps", // Microsoft Works Word Processor
"wpd", // WordPerfect Document
"xls", // Microsoft Excel 97-2003
"xlsx", // Microsoft Excel 2007-2019
"ods", // OpenDocument Spreadsheet
"ots", // OpenDocument Spreadsheet Template
"csv", // Comma-Separated Values
"tsv", // Tab-Separated Values
"ppt", // Microsoft PowerPoint 97-2003
"pptx", // Microsoft PowerPoint 2007-2019
"odp", // OpenDocument Presentation
"otp", // OpenDocument Presentation Template
];.
May not be reproduced without permission:Chief AI Sharing Circle " Zerox: PDF, DOCX, image conversion to Markdown, visual modeling high-precision OCR

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish