Dolphin - Wordpress open source lightweight document parsing large model

Latest AI Resources7mos agorelease AI Sharing Circle

36.4K 00

What's Dolphin?

Dolphin is a byte jump open source lightweight document parsing large model , with 322M parameters , small size and fast running speed . The model is based on a two-stage parsing approach , based on page-level layout analysis to identify the elements of the document ( such as titles , tables , formulas , etc.) , and then each element of the content analysis , the model supports the extraction of text , formulas , tables , and other elements , support for the output of JSON, Markdown, HTML formats , etc. Dolphin applies to academic research , commercial office , education, technology development and other Dolphin is suitable for academic research, commercial office, education, technology development, etc. Dolphin can efficiently process academic papers, business reports, technical documents, etc., help digitize documents and extract information, and improve office efficiency.

Dolphin's main features

Layout analysisAccurately recognizes all kinds of elements such as titles, charts, tables, footnotes, etc., and generates a clear sequence of elements based on the natural reading order, laying the foundation for the subsequent content parsing.
content extraction: Parses document pages into structured JSON or Markdown format for subsequent processing and presentation.
text parsing: Accurately extract text content from documents, covering Chinese, English and many other languages.
formula recognition: Supports recognition of complex in-line formulas and block-level formulas, output in LaTeX format for easy handling of academic and technical documents.
table analysis: Support for parsing complex table structures and extracting cell contents to generate HTML-formatted tables to meet the needs of a variety of application scenarios.
Lightweight ArchitectureThe model has a reference number of 322M, is small and fast, and is suitable for use in resource-constrained devices or environments.
Multiple inputs and outputsIt supports various document image inputs such as academic papers, business reports, technical documents, etc. The parsing results can be output in JSON, Markdown, HTML and other formats, which is convenient for integration with different systems.

Dolphin's official website address

GitHub repository::https://github.com/bytedance/Dolphin
HuggingFace Model Library::https://huggingface.co/ByteDance/Dolphin
arXiv Technical Paper::https://arxiv.org/pdf/2505.14059
Online Experience Demo::http://115.190.42.15:8888/dolphin/

How to use Dolphin

Online Experience Demo: Visit the Dolphin online experience demo address, the user directly uploads document images for parsing, without the need to install or configure any environment.
GitHub Repository Deployment::
- clone warehouse::

git clone https://github.com/bytedance/Dolphin.git
cd Dolphin

- Installation of dependencies::

pip install -r requirements.txt

- Download pre-trained model: Download and unzip the pre-trained model files according to the instructions in the GitHub repository.
- running code: Run Dolphin following the sample code in the repository. for example:

from dolphin import DolphinParser

parser = DolphinParser(model_path="path/to/model")
result = parser.parse(image_path="path/to/document.jpg")
print(result)

Hugging Face Model Library::
- Installing the Hugging Face Library::

pip install transformers

- Loading Models::

from transformers import AutoModelForDocumentParsing, AutoFeatureExtractor

model_name = "ByteDance/Dolphin"
model = AutoModelForDocumentParsing.from_pretrained(model_name)
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)

# 加载文档图像并进行预处理
image = feature_extractor(images="path/to/document.jpg", return_tensors="pt")

# 进行解析
outputs = model(**image)
# 处理输出结果

- Processing output results: Further processing and use of parsing results based on the output format of the model (e.g., JSON, HTML, etc.).

Dolphin's core strengths

Lightweight & Efficient: Dolphin is only 322M in number of participants, small and fast, suitable for resource-constrained environments.
Two-stage parsing approach: Parsing layout before content, based on parallel processing to improve efficiency and accuracy.
Powerful document parsing capabilities: Supports parsing of text, tables, formulas, charts and other elements to cover complex document structures.
Multi-language support: Accurately recognize Chinese, English and other multi-language text to meet the needs of multi-language document processing.
Diverse inputs and outputs: Compatible with a variety of document formats input, support for JSON, Markdown, HTML and other formats output, easy to integrate.
Open Source and Ease of Use: The code and pre-trained models are open source and provide rich resources for developers to quickly get started and customize their development.
High performance: Outperforms mainstream models such as GPT-4.1 and Mistral-OCR in document parsing tasks, and excels in table and formula recognition.

Who Dolphin is for

research worker: Rapidly parse text, formulas, and graphs in academic papers to help researchers efficiently organize literature and extract key information to accelerate scientific work.
Corporate office staff: Business people extract key information from contracts, reports and other business documents to assist in contract review and report generation, and improve office efficiency.
educatorTeachers and educational institutions use Dolphin to digitize textbooks and test papers, support online teaching and multilingualism, and enrich teaching resources.
Technology Developer: Developers parse technical documentation to facilitate code management and technical exchanges, as well as secondary development and customization based on open source code.
schoolchildren: Students quickly organize study materials and extract key points to aid in learning and review.