Llama OCR: OCR library that converts images to Markdown in three lines of code using the free Llama 3.2 Vision interface

Latest AI Resources8mos agorelease AI Sharing Circle

1.9K 00

General Introduction

Llama OCR is an OCR (Optical Character Recognition) library based on Llama 3.2 Vision that converts documents to Markdown format. The library was developed by Nutlope and uses the Together The free Llama 3.2 interface provided by AI parses images and returns Markdown text. llama OCR supports OCR of local and remote images, and there are plans to support OCR of PDF files in the future. npm installs the library and makes it easy to invoke its functionality in your projects.

Reference items: Zerox

Llama OCR：利用免费Llama 3.2 Vision接口，三行代码将图像转换为Markdown的OCR库

Demo: https://llamaocr.com/

Rely on the free interface to Meta Llama Vision provided by together: https://api.together.ai/models/meta-llama/Llama-Vision-Free

More free visual models:Smart Spectrum open platform, the first free multimodal vision model GLM-4V-Flash on line, unlimited use!

Function List

Image OCR: Supports optical character recognition of local and remote images.
Markdown output: Converts recognized text to Markdown format.
Multi-model support: Free and paid Llama 3.2 model interfaces are available to meet different performance requirements.
API integration: Image parsing via Together AI's API.
future functions: The program supports OCR processing of single and multi-page PDFs, as well as JSON format output.

Using Help

Installation process

Ensure that the Node.js environment is installed.
Install the Llama OCR library using npm:

   npm i llama-ocr

Usage

Import the Llama OCR library:

   import { ocr } from "llama-ocr";

invocations ocr function for image parsing:

   const markdown = await ocr({
filePath: "./trader-joes-receipt.jpg", // 图像文件路径
apiKey: process.env.TOGETHER_API_KEY, // Together AI API 密钥
});

Processes the returned Markdown text:

   console.log(markdown);

Detailed Function Operation

Image OCR: Pass the image file path to the ocr function, you can get the text content of the image.
Markdown output: The parsed text is automatically converted to Markdown format for easy use in documents.
Multi-model support: By setting the model parameters, different Llama 3.2 models can be selected (e.g. Llama-3.2-90B-Vision maybe Llama-3.2-11B-Vision) to meet different performance needs.
API integration: Together AI's API key needs to be set in an environment variable in order to call its interface for image parsing.

sample code (computing)

import { ocr } from "llama-ocr";
async function runOCR() {
const markdown = await ocr({
filePath: "./example-image.jpg",
apiKey: "your-together-ai-api-key",
});
console.log(markdown);
}
runOCR();

future functions

PDF Support: Future versions will support OCR of single and multi-page PDF files.
JSON output: In addition to the Markdown format, JSON output will be supported for easy data processing and integration.

With the above steps, users can easily install and use the Llama OCR library to convert text content in images to Markdown format, improving document processing efficiency.