General Introduction
Llama OCR is an OCR (Optical Character Recognition) library based on Llama 3.2 Vision that converts documents to Markdown format. The library was developed by Nutlope and uses the Together The free Llama 3.2 interface provided by AI parses images and returns Markdown text. llama OCR supports OCR of local and remote images, and there are plans to support OCR of PDF files in the future. npm installs the library and makes it easy to invoke its functionality in your projects.
Reference items: Zerox
More free visual models:Smart Spectrum open platform, the first free multimodal vision model GLM-4V-Flash on line, unlimited use!
Function List
- Image OCR: Supports optical character recognition of local and remote images.
- Markdown output: Converts recognized text to Markdown format.
- Multi-model support: Free and paid Llama 3.2 model interfaces are available to meet different performance requirements.
- API integration: Image parsing via Together AI's API.
- future functions: The program supports OCR processing of single and multi-page PDFs, as well as JSON format output.
Using Help
Installation process
- Ensure that the Node.js environment is installed.
- Install the Llama OCR library using npm:
npm i llama-ocr
Usage
- Import the Llama OCR library:
import { ocr } from "llama-ocr".
- invocations
ocr
function for image parsing:
const markdown = await ocr({
filePath: ". /trader-joes-receipt.jpg", // image file path
apiKey: process.env.TOGETHER_API_KEY, // Together AI API key
});
- Processes the returned Markdown text:
console.log(markdown);
Detailed Function Operation
- Image OCR: Pass the image file path to the
ocr
function, you can get the text content of the image. - Markdown output: The parsed text is automatically converted to Markdown format for easy use in documents.
- Multi-model support: By setting the
model
parameters, different Llama 3.2 models can be selected (e.g.Llama-3.2-90B-Vision
maybeLlama-3.2-11B-Vision
) to meet different performance needs. - API integration: Together AI's API key needs to be set in an environment variable in order to call its interface for image parsing.
sample code (computing)
import { ocr } from "llama-ocr".
async function runOCR() {
const markdown = await ocr({
filePath: ". /example-image.jpg",
apiKey: "your-together-ai-api-key",
});
console.log(markdown);
}
runOCR();
future functions
- PDF Support: Future versions will support OCR of single and multi-page PDF files.
- JSON output: In addition to the Markdown format, JSON output will be supported for easy data processing and integration.
With the above steps, users can easily install and use the Llama OCR library to convert text content in images to Markdown format, improving document processing efficiency.