OCR

Total 27 articles posts
自动解析PDF内容并提取文字与表格的开源服务

Automatically parse PDF content and extract text and tables of open source services

Comprehensive Introduction It can automatically analyze the layout of PDF documents, identify text, titles, images, tables, formulas and other elements in the page, and determine their correct order. The tool supports OCR functionality and can convert scanned PDF to searchable text. It runs on Docker and provides two models...
4mos ago
0998
Mistral OCR:94.89%总体精度,1000 页/30秒,只需1美元

Mistral OCR: 94.89% Overall Accuracy, 1000 Pages/30 Seconds, Only $1

In the long history of human civilization, every leap in the way information is acquired and parsed has profoundly driven social progress. From the ancient hieroglyphics, to the portable papyrus, to the later emergence of the printing press and today's wave of digitization, each technological innovation has greatly expanded the paradigm of human knowledge dissemination...
5mos ago
01.1K
MinerU:PDF文档提取转换为多模态Markdown格式,支持电子书OCR扫描

MinerU: PDF document extraction and conversion to multimodal Markdown format, support e-book OCR scanning

Comprehensive Introduction MinerU is an open source data extraction tool developed by the OpenDataLab team at the Shanghai Artificial Intelligence Laboratory, focusing on efficiently extracting content from complex PDF documents, web pages, and eBooks. It can take multimodal PDFs containing images, formulas, tables and other elements...
10mos ago
02.3K