AI Personal Learning
and practical guidance
豆包Marscode1
Total 23 articles

Tags: OCR

Mistral OCR:94.89%总体精度,1000 页/30秒,只需1美元-首席AI分享圈

Mistral OCR: 94.89% Overall Accuracy, 1000 Pages/30 Seconds, Only $1

In the long history of human civilization, every leap in the way information is acquired and parsed has profoundly driven social progress. From the ancient hieroglyphics, to the portable papyrus, to the later emergence of the printing press and today's wave of digitization, each technological innovation has greatly expanded the transmission of human knowledge...

Ollama OCR:使用Ollama中视觉模型提取图像中的文本-首席AI分享圈

Ollama OCR: Extracting Text from Images Using Visual Models in Ollama

Comprehensive Introduction Ollama OCR is a powerful Optical Character Recognition (OCR) toolkit that utilizes the state-of-the-art visual language model provided by the Ollama platform to extract text from images. The project is available both as a Python package and provides a user-friendly Streamlit web application interface. It supports multiple ...

Chunkr:使用视觉模型进行文档摄取以及根据文本段落层级智能分块的一体化服务-首席AI分享圈

Chunkr: An All-in-One Service for Document Ingestion and Intelligent Chunking Based on Text Paragraph Hierarchy Using Visual Models

Comprehensive Introduction Chunkr is a self-hosted API specialized in converting PDF, PPTX, DOCX, and Excel files into data suitable for use in RAG (Retrieval Augmented Generation) and LLM (Large Language Modeling). It was developed by Lumina AI Inc. and utilizes advanced visual models for document ingest...

ViTLP:排版复杂PDF文档提取结构化数据,视觉引导生成文本布局预训练模型-首席AI分享圈

ViTLP: Extracting Structured Data from Typographically Complex PDF Documents and Visually Guided Generation of Text Layout Pre-training Models

Comprehensive Introduction ViTLP (Visually Guided Generative Text-Layout Pre-training for Document Intelligence) is an open source project that aims to enhance document intelligence processing through visually guided generative text layout pre-training models. The project was developed by Veason-silverbul...

ScreenPipe:24小时收集录屏和操作信息并转换为本地知识库,通过AI助手对话、总结、回顾知识-首席AI分享圈

ScreenPipe: 24-hour collection of recorded screen and operation information and converted into a local knowledge base, through the AI assistant conversation, summarize, review knowledge

General Introduction ScreenPipe is an AI assistant developed by mediar-ai that specializes in recording screen content, capturing screenshots and audio 24/7. It combines the technology of rewind.ai and cursor.com to store recorded data in a local database and supports Chinese ...

文本提取API(text-extract-api):视觉提取文本信息,匿名化的PDF提取工具-首席AI分享圈

Text Extraction API (text-extract-api): visual extraction of text information, anonymized PDF extraction tool

General Description Text Extraction API (text-extract-api) is a powerful tool designed to extract and parse content from a variety of document formats (e.g. PDF, Word, PPTX, etc.). The API utilizes state-of-the-art Optical Character Recognition (OCR) technology and Ollama-supported models to be able to take any document or image...

eSearch:多功能跨平台OCR工具,集成搜索|翻译|搜图|录屏等功能-首席AI分享圈

eSearch: Multi-functional cross-platform OCR tool, integrated search | translation | search map | screen recording and other functions

General Introduction eSearch is an open source cross-platform screenshot tool developed by xushengfeng that supports Windows, macOS and Linux systems. eSearch integrates a variety of features including OCR recognition, search, translation, mapping, image search and screen recording. It integrates a variety of features, including screenshot, OCR recognition, search, translation, mapping, image search and screen recording. eSearch uses Electron box...

AI tools
Surya:专业多语言文档OCR工具,开源本地部署-首席AI分享圈

Surya: professional multilingual document OCR tool, open source native deployment

Comprehensive Introduction Surya is an open source OCR toolkit for multilingual documents that supports text recognition in more than 90 languages. It is capable of not only line-by-line text detection, but also layout analysis, reading order detection and table recognition.Surya's performance is comparable to cloud services for a wide range of document types, including p...

MinerU:PDF文档提取转换为多模态Markdown格式,支持电子书OCR扫描-首席AI分享圈

MinerU: PDF document extraction and conversion to multimodal Markdown format, support e-book OCR scanning

Comprehensive Introduction MinerU is an open source data extraction tool developed by the OpenDataLab team at the Shanghai Artificial Intelligence Laboratory, focusing on efficiently extracting content from complex PDF documents, web pages, and eBooks. It can convert multimodal PDF documents containing images, formulas, tables and other elements into easy-to-analyze M...

en_USEnglish