olmOCR 2 - AI2 open source multimodal document parsing model

Latest AI Resources5mos agorelease AI Sharing Circle

37.3K 00

What is olmOCR 2

olmOCR 2 is an open source multimodal document parsing model from the Allen Institute for Artificial Intelligence (AI2), an upgraded version of olmOCR. Efficiently convert digitized print documents (e.g., PDF) into clean, naturally sorted plain text. Based on the Qwen2.5-VL-7B model , optimized by reinforcement learning (RLVR) , combined with synthetic data generation and unit testing mechanism , to solve the traditional OCR in complex scenarios ( such as mathematical formulas , tables , multi-column layout ) accuracy problems . The performance in document parsing tasks is outstanding, especially when dealing with complex formats and structured content, the accuracy rate is significantly higher than similar models. For example, in tasks such as mathematical formula recognition and table data extraction, the document content can be restored more accurately.

Features of olmOCR 2

Efficient Text Extraction: Extracts high-quality plain text from complex PDF documents, correctly handles multi-column layouts, tables, mathematical formulas, and handwritten content, and ensures that the text is in a natural reading order.
Intensive Learning Training: Employing reinforcement learning with verifiable rewards (RLVR), combined with binary unit tests as reward signals, significantly improves the model's performance in mathematical formula conversion, table parsing, and multi-column layout.
Synthetic data generation: A synthetic document generation pipeline has been developed to create synthetic documents with diverse and complex layouts at scale, as well as corresponding HTML source code and test cases, providing rich data to support model training.
Dynamic temperature adjustment: Dynamic temperature adjustment is used in the inference process to balance the high accuracy due to low temperature and the avoidance of repetitive loops to improve the quality of the generated text.

Core benefits of olmOCR 2

Advanced OCR Technology: Based on the 7B Visual Language Model (VLM), it significantly improves the processing of mathematical formulas, tables and multi-column layouts through reinforcement learning with verifiable reward (RLVR) training.
Efficient data generation: Developed a synthetic document generation pipeline that enables large-scale creation of synthetic documents with complex layouts and corresponding test cases, providing rich and diverse data for model training.
Dynamic temperature adjustment: A dynamic temperature adjustment strategy is used in the inference process to balance the quality and efficiency of text generation and effectively avoid the repetitive loop problem.
Optimized cueing strategy: Standardize the text and image order of cues to ensure consistency during training and inference, improving model stability and performance.
Average of model weights: The accuracy and robustness of the model is further improved by training multiple models and averaging their weights ("souping").

What is olmOCR 2's official website?

Github repository:: https://github.com/allenai/olmocr
arXiv Technical Paper:: https://arxiv.org/pdf/2510.19817
Experience Address:: https://olmocr.allenai.org/

Who olmOCR 2 is for

research worker: Scholars engaged in research in optical character recognition (OCR) and related fields can use the open source model and data of olmOCR 2 for algorithm improvement, performance optimization and other research work.
developersSoftware developers can integrate olmOCR 2 into their applications to provide users with high quality PDF text extraction for document processing, content management systems, and more.
data scientist: Data scientists who need to work with large amounts of digitized document data can use olmOCR 2 to quickly and accurately extract text content for data analysis and mining.
business user: Departments responsible for document management, information extraction and knowledge management in an organization can use olmOCR 2 to increase efficiency and reduce the time and cost of manual document processing.
educator: Teachers and researchers in the field of education can use olmOCR 2 to convert PDF documents such as academic literature and textbooks into editable text for teaching and research.
schoolchildren: Students who need to deal with a large amount of documentation can use olmOCR 2 to quickly extract text from PDF documents to aid in their studies and research.