In the long history of human civilization, every leap in the way information is acquired and analyzed has profoundly contributed to social progress. From the ancient hieroglyphics, to the portable papyrus, to the later emergence of the printing press and today's wave of digitization, each technological innovation has greatly expanded the scope of dissemination of human knowledge and the depth of its application, and thus become a fertile ground for a new round of innovation.
Today, we are at an exciting turning point, with unprecedented opportunities to unlock the potential of the vast amount of digitized information. According to industry data, approximately 90% of organizational data is still stored in document form, which contains a huge amount of information value that has yet to be tapped. In order to unlock these dormant data assets, Mistral AI has launched the Mistral OCRThis is an Optical Character Recognition (OCR) API that marks a new level of document understanding technology.
Core Benefits of Mistral OCR
Mistral More than just a simple OCR tool, OCR represents a complete revolution in the way documents are understood. Compared to other OCR models on the market, Mistral OCR has more powerful document recognition capabilities and higher accuracy, and is able to understand every component of a document - whether it's an image, text, a table, or a mathematical formula - Mistral OCR can handle it with ease. Users simply upload an image or PDF document and the structured content is quickly extracted and presented in an organized graphical format.
In summary, Mistral OCR has several key benefits:
- Top-notch understanding of complex documents: Accurately parse documents with mixed graphics, complex mathematical formulas, tables, and advanced formats such as LaTeX.
- Native multi-language and multi-modal support: Born with the ability to handle multi-language and multi-modal documents without additional configuration.
- Excellent performance indicators: Mistral OCR has been ranked among the top performers in a number of authoritative benchmarks.
- Lightning fast processing: Mistral OCR has the fastest processing speed of any OCR product in its class.
- Innovative "Document as Prompt" model with structured outputs: Supports the entire document as a Prompt command and can output highly structured data results.
- Flexible and optional self-hosted solutions: For organizations that demand the ultimate in data security, Mistral OCR offers optional self-hosted deployment options.
With these significant advantages, Mistral OCR is the perfect tool for building RAG Mistral OCR is ideal for use with Retrieval-Augmented Generation (RAG) systems, especially when dealing with information-rich multimodal documents such as slides, complex PDF files, etc. Mistral OCR is currently being used in the Mistral OCR system. Currently, Mistral OCR has been Mistral AI The star of the show Le Chat Conversational AI platform adoption that delivers powerful document comprehension to millions of users. api version mistral-ocr-latest Now available, pricing is competitive at $1 per 1000 pages, and even more cost-effective when utilizing a batch inference model. Developers can get access to the Mistral AI Developer Platform immediately through the La Plateforme Experience the power of Mistral OCR. In the future, Mistral OCR will also be deployed more broadly through Mistral AI's cloud services and partner network, and will support localized enterprise deployments.
Next, we will analyze the core technical advantages of Mistral OCR, and introduce how to quickly get started with Mistral OCR through the API.
Mistral OCR Core Benefits Explained
Deep understanding of complex documents
Mistral OCR excels in understanding complex documents thanks to its advanced model architecture and training strategy. Mistral OCR is able to accurately analyze documents that are interleaved with graphics, academic papers containing a large number of specialized mathematical formulas, sophisticated tables, and documents generated by complex typesetting systems such as LaTeX. Even in the case of information-dense scientific papers, which are interspersed with charts, graphs, formulas, and images, Mistral OCR is able to understand the underlying logic and information of the document.
In order to let users experience the power of Mistral OCR more intuitively, the Mistral AI team prepared a special demonstration case. They fed a typical PDF document into Mistral OCR, and the model successfully extracted all text and image information from it, and efficiently converted it into a Markdown format file, perfectly preserving the structure and content of the original text. Interested developers can visit Colab notebook Experience the process for yourself.
In order to more clearly demonstrate the document parsing effect of Mistral OCR in real applications, the Mistral AI team has also carefully prepared a number of PDF documents and their corresponding OCR results comparison. Users can switch freely between the original document and the OCR result by simple sliding operation, and intuitively feel the excellent performance of Mistral OCR in dealing with a variety of complex documents.
Tables + Graphics
OCR Results
formula
OCR Results
Hindi (language)
OCR Results
ordinary document
OCR Results
Arabic (language)
OCR Results
Excellent performance benchmarking
In order to fully evaluate the performance level of Mistral OCR, the Mistral AI team conducted a series of rigorous benchmark tests. The results clearly show that Mistral OCR significantly outperforms other leading OCR models on the market in a number of key metrics. Of particular note, Mistral OCR excelled in its ability to accurately extract embedded images from documents, a feature not currently available in any of the other Large Language Models (LLMs) compared. To ensure a fair evaluation, the Mistral AI team also built an internal "text-only" test set that was used to benchmark the models against each other. The test set covers a wide range of published papers and Internet-sourced PDFs to provide a comprehensive and objective view of the models' real-world performance.
Below is the detailed benchmark result data:
mould | overall performance | Mathematical formula recognition | Multilingual support | Scanned Document Recognition | form recognition |
---|---|---|---|---|---|
Google Document AI | 83.42 | 80.29 | 86.42 | 92.77 | 78.16 |
Azure OCR | 89.52 | 85.72 | 87.52 | 94.65 | 89.52 |
Gemini-1.5-Flash-002 | 90.23 | 89.11 | 86.76 | 94.87 | 90.48 |
Gemini-1.5-Pro-002 | 89.92 | 88.48 | 86.33 | 96.15 | 89.71 |
Gemini-2.0-Flash-001 | 88.69 | 84.18 | 85.80 | 95.11 | 91.46 |
gpt-4o-2024-11-20 | 89.77 | 87.55 | 86.00 | 94.58 | 91.70 |
Mistral OCR 2503 | 94.89 | 94.29 | 89.55 | 98.96 | 96.12 |
It is clear from the above data that Mistral OCR has achieved significant leadership in all key performance indicators, particularly in overall performance and form recognition.
Native multilingual processing capabilities
Since the beginning of Mistral AI, serving global users has been an important development goal. Therefore, building powerful multilingual processing capabilities has been one of the core strategies of Mistral AI product development, and Mistral OCR has achieved new breakthroughs in this regard by seamlessly parsing, accurately understanding, and efficiently transcribing thousands of different texts, fonts, and languages, comprehensively covering languages and cultures from all continents. This superior multilingual adaptability is strategically important for multinational companies with global operations that handle documents from different language regions, as well as for localization companies that focus on specific language markets and serve местный users.
The following table shows the benchmarking results of Mistral OCR in the multilingual fuzzy match generation task:
mould | Fuzzy Matching Generation Accuracy |
---|---|
Google-Document-AI | 95.88% |
Gemini-2.0-Flash-001 | 96.53% |
Azure OCR | 97.31% |
Mistral OCR 2503 | 99.02% |
Test data shows that Mistral OCR also performs well in multilingual fuzzy match generation, with performance metrics that exceed those of other mainstream OCR products, reaffirming its powerful multilingual processing capabilities.
In order to evaluate the performance of Mistral OCR in different languages, the Mistral AI team also conducted more detailed language-specific benchmark tests, and the results are as follows:
multilingualism | Azure OCR | Google Doc AI | Gemini-2.0-Flash-001 | Mistral OCR 2503 |
---|---|---|---|---|
Russian (ru) | 97.35% | 95.56% | 96.58% | 99.09% |
French (fr) | 97.50% | 96.36 | 97.06% | 99.20% |
Hindi (hi) | 96.45% | 95.65 | 94.99% | 97.55% |
Chinese (zh) | 91.40% | 90.89% | 91.85% | 97.11% |
Portuguese (pt) | 97.96% | 96.24 | 97.25% | 99.42% |
German (de) | 98.39% | 97.09% | 97.19 | 99.51% |
Spanish (es) | 98.54% | 97.52 | 97.75 | 99.54% |
Turkish (tr) | 95.91% | 93.85 | 94.66% | 97.00% |
Ukrainian (uk) | 97.81% | 96.24 | 96.70% | 99.29% |
Italian (it) | 98.31% | 97.69 | 97.68 | 99.42% |
Romanian (ro) | 96.45% | 95.14 | 95.88% | 98.79% |
From the results of the sub-language test, Mistral OCR performs well in the recognition accuracy of various languages, especially in the recognition of Chinese, the advantage of Mistral OCR is especially obvious.
Extremely fast document processing power
Mistral OCR's lightweight design, combined with its pursuit of superior performance, makes it far faster than competing products. In a standard single-node configuration, Mistral OCR can process up to 2000 pages per minute. This amazing document processing speed ensures efficient operation of the system even in high-load application scenarios that require massive document processing, and supports continuous learning and performance optimization of the system.
"Document as Prompt" and Structured Output
Another innovative feature of Mistral OCR is the "Document as Prompt" Models. This feature allows users to directly model the entire document as a Prompt input for more powerful and accurate information extraction. Users can instruct Mistral OCR to extract specific information from a document and output structured data in a predefined format, such as JSON. This structured output allows for easy integration with downstream applications and workflows, for example, users can use the extracted data directly for function calls or building intelligent agents. the Mistral AI team also provides a notebook Example The "Document as Prompt" feature helps users get up and running quickly.
Flexible self-hosted deployment options
Mistral OCR offers a self-hosted deployment option in recognition of the fact that some businesses and organizations have extremely stringent data privacy and security requirements. Those who choose the self-hosted deployment option can deploy Mistral OCR entirely on their own infrastructure, ensuring that all sensitive data and confidential information is always handled in their own secure and controlled environment, meeting the most stringent regulatory compliance and data security standards. For organizations with self-hosted deployment needs, feel free to contact Mistral AI for more information.
Getting Started with the Mistral OCR API
The Mistral OCR API is very easy to use, and Mistral AI provides SDKs in Python and Typescript as well as sample curl requests for developers to quickly integrate.
Document OCR Processor
The core functionality of Mistral OCR is driven by the document OCR processor, which is built on Mistral AI's latest OCR model, mistral-ocr-latest, to accurately extract text and structured content from PDF documents.
Main characteristics::
- Structured Content Extraction: While extracting the text content, the original structure and hierarchical relationships of the document are retained intact.
- Formatted information retention: Ability to accurately recognize and retain a variety of formatted information in a document, such as headings, paragraphs, lists, and tables.
- Markdown format output: The results are presented in a clean, easy-to-use Markdown format for secondary parsing and rendering.
- Complex Layout Processing: Easily handle a variety of complex document layouts, including multi-column text and mixed-content typesetting.
- High-precision, large-scale processing: Support batch processing of large-scale documents while ensuring high recognition accuracy.
- Extensive document format support: Supports multiple input formats such as PDF, images, and user uploaded documents.
Document OCR processors not only return extracted text content, but also contain metadata about the structure of the document, which makes it easier for developers to programmatically work with the recognized document content.
PDF Document OCR
The following code example shows how to use the Mistral OCR API to process PDF documents:
import os
from mistralai import Mistral
api_key = os.environ["MISTRAL_API_KEY"]
client = Mistral(api_key=api_key)
ocr_response = client.ocr.process(
model="mistral-ocr-latest",
document={
"type": "document_url",
"document_url": "https://arxiv.org/pdf/2201.04234"
},
include_image_base64=True
)
Upload PDF documents for OCR
The Mistral OCR API also supports users uploading PDF files for OCR processing.
File Upload
First, the PDF file needs to be uploaded to Mistral AI's file service:
from mistralai import Mistral
import os
api_key = os.environ["MISTRAL_API_KEY"]
client = Mistral(api_key=api_key)
uploaded_pdf = client.files.upload(
file={
"file_name": "uploaded_file.pdf", "content":open(
"content":open("uploaded_file.pdf", "rb"),
},
purpose="ocr"
)
Document retrieval
After a successful upload, you can retrieve information about the uploaded file:
client.files.retrieve(file_id=uploaded_pdf.id)
id='00edaf84-95b0-45db-8f83-f71138491f23' object='file' size_bytes=3749788 created_at=1741023462 filename='uploaded_file.pdf' purpose=' ocr' sample_type='ocr_input' source='upload' deleted=False num_lines=None
Get Signature URL
For secure access to an uploaded file, you can obtain the file's signature URL:
signed_url = client.files.get_signed_url(file_id=uploaded_pdf.id)
Getting OCR results
Finally, use the signature URL as the document address to get the OCR result of the uploaded PDF file:
import os
from mistralai import Mistral
api_key = os.environ["MISTRAL_API_KEY"]
client = Mistral(api_key=api_key)
ocr_response = client.ocr.process(
model="mistral-ocr-latest",
document={
"type": "document_url",
"document_url": signed_url.url,
}
)
Image OCR
The Mistral OCR API also supports direct OCR of images.
URL Image OCR
OCR recognition can be performed directly from the image URL:
import os
from mistralai import Mistral
api_key = os.environ["MISTRAL_API_KEY"]
client = Mistral(api_key=api_key)
ocr_response = client.ocr.process(
model="mistral-ocr-latest",
document={
"type": "image_url",
"image_url": "https://media-cldnry.s-nbcnews.com/image/upload/t_fit-560w,f_avif,q_auto:eco,dpr_2/rockcms/2023-11/short-quotes-swl- 231117-02-33d404.jpg"
}
)
Base64 encoded image OCR
Alternatively, the image can be Base64 encoded and passed to the API for OCR recognition:
import base64
import requests
import os
from mistralai import Mistral
defencode_image(image_path):: """Encode the image to base64.
"""Encode the image to base64.""""
try.
withopen(image_path, "rb") as image_file: return base64.
return base64.b64encode(image_file.read()).decode('utf-8')
except FileNotFoundError: print(f "Error: The
print(f "Error: The file {image_path} was not found.")
returnNone
except Exception as e:# Added general exception handling
print(f "Error: {e}")
returnNone
# Path to your image
image_path ="path_to_your_image.jpg"
# Getting the base64 string
base64_image = encode_image(image_path)
api_key = os.environ["MISTRAL_API_KEY"]
client = Mistral(api_key=api_key)
ocr_response = client.ocr.process(
model="mistral-ocr-latest",
document={
"type": "image_url",
"image_url":f "data:image/jpeg;base64,{base64_image}"
}
)
Document Understanding Function
Mistral OCR's Document Understanding feature is an innovative application that deeply integrates powerful OCR technology with large-scale language modeling (LLM). It empowers users with the ability to interact with document content in natural language, allowing them to efficiently extract information and insights from documents through natural language questioning.
The workflow for document understanding consists of two main steps::
- file processing: First, unstructured documents are converted into a machine-readable format by extracting text, structure and formatting information from the document through OCR technology.
- language model understanding: Subsequently, the large-scale language model deeply analyzes and understands the extracted document content. Users can ask questions or information requests in natural language, and the model understands the context and intrinsic associations of the document and gives accurate answers based on the document content.
Key Competencies for Document Understanding::
- Q&A based on document content: Be able to answer natural language questions about the specific content of a document.
- Information extraction and summarization: Extract key information from documents and generate concise summaries.
- Document Analysis and Insight: Perform in-depth analysis of document content to uncover potential insights and knowledge.
- Multi-Document Queries and Comparisons: Support for information query and content comparison across multiple documents.
- Context-aware responses: Be able to give more accurate and relevant responses, taking into account the full contextual information of the document.
Typical application scenarios for document understanding::
- Analysis of scientific papers and technical documentation: Rapidly analyze and understand large volumes of scientific papers and technical documents.
- Business Document Information Extraction: Efficiently extract key information from documents such as business contracts and reports.
- Legal documentation and contract processing: Assist in processing and analyzing complex legal documents and contractual terms.
- Building a Document Quiz App: Development of an intelligent document question-and-answer system to improve the efficiency of information retrieval.
- Automated Document Workflow: Automate a variety of document-based workflows, such as document review and information entry.
The following code example shows how to use natural language to interact with a PDF document and ask what the last sentence of the document is:
import os
from mistralai import Mistral
# Retrieve the API key from environment variables
api_key = os.environ["MISTRAL_API_KEY"]
# Specify model
model = "mistral-small-latest"
# Initialize the Mistral client
client = Mistral(api_key=api_key)
# Define the messages for the chat
messages =[
{
"role": "user",
"content":[
{
"type": "text", "text": "what is the last sentence in the document
"text": "what is the last sentence in the document"
},
{
"type": "document_url", "document_url":"", { "type": "text", "text".
"document_url": "https://arxiv.org/pdf/1805.04770"
}
]
}
]
# Get the chat response
chat_response = client.chat.complete(
model=model,
messages=messages
)
# Print the content of the response
print(chat_response.choices[0].message.content)
# Output.
# The last sentence in the document is:\n\n\"Zaremba, W., Sutskever, I., and Vinyals, O. Recurrent neural network regularization. arXiv. 1409.2329, 2014.
Application Cases
Mistral OCR's powerful document comprehension capabilities are unlocking tremendous value in real-world applications across a wide range of industries, helping businesses and organizations transform massive amounts of document data into actionable knowledge and solutions. Currently, Mistral OCR has achieved significant results in the following key areas:
Digital Transformation of Research: Numerous top research organizations have begun experimenting with Mistral OCR to convert large volumes of scientific papers and academic journals into AI-friendly data formats, enabling seamless access to a wide range of downstream intelligence analytics engines. This has dramatically improved the efficiency of research collaboration and accelerated research workflows.
Digital preservation and transmission of cultural heritage: Many cultural heritage preservation organizations and non-profit institutions are actively adopting Mistral OCR technology to digitize valuable historical documents and artifacts for permanent preservation and wider dissemination and sharing of cultural heritage.
Intelligent upgrade of customer service: The customer service department is also actively exploring the application of Mistral OCR, attempting to transform complicated product documentation and user manuals into a structured, indexable knowledge base, thereby significantly reducing customer response time and improving customer service quality and user satisfaction.
AI Enablement for Literature Across Industries: Mistral OCR is helping organizations in a wide range of industries convert large volumes of technical documents, engineering drawings, 강의 notes, presentations, regulatory filings, etc. into indexable and retrievable AI-friendly formats that tap into the knowledge and intelligence embedded in the documents to enhance organizational productivity.
Experience the power of Mistral OCR today!
Experience the power of Mistral OCR today! Users can experience Mistral OCR's document comprehension capabilities for free by visiting the Le Chat platform. For an API version, visit La Plateforme. The Mistral AI team looks forward to receiving valuable feedback from users and will continue to optimize and iterate on the Mistral OCR model to improve its performance. As part of the strategic partnership program, Mistral AI is also offering a local deployment option for select users.
More resources
For more information on how to use Mistral OCR and advanced tips, please refer to the following resources:
- Tool Use and Document Understanding Cookbook: https://colab.research.google.com/github/mistralai/cookbook/blob/main/mistral/ocr/document_understanding.ipynb
- Batch OCR Cookbook: https://colab.research.google.com/github/mistralai/cookbook/blob/main/mistral/ocr/batch_ocr.ipynb
- Structured OCR Cookbook: https://colab.research.google.com/github/mistralai/cookbook/blob/main/mistral/ocr/structured_ocr.ipynb
These Cookbooks provide detailed code samples and hands-on guides to help developers gain a deeper understanding and application of Mistral OCR's features.