General Introduction PDF Craft is an open source tool designed for scanning PDFs of books and converting them to Markdown format. It is developed by oomol-lab and hosted on GitHub for users who like to organize their eBooks. The tool runs through a local AI model without the need for an Internet connection, which is both privacy-preserving and square...
SmolDocling is a Visual Language Model (VLM) developed by ds4sd team in collaboration with IBM, based on SmolVLM-256M, hosted on Hugging Face platform. It is the world's smallest VLM with only 256M parameters, and its core function is to provide a visual language model (VLM) from images...
Enable Builder Smart Programming Mode, unlimited use of DeepSeek-R1 and DeepSeek-V3, smoother experience than the overseas version. Just enter the Chinese commands, even a novice programmer can write his own apps with zero threshold.
In the long history of human civilization, every leap in the way information is acquired and parsed has profoundly driven social progress. From the ancient hieroglyphics, to the portable papyrus, to the later emergence of the printing press and today's wave of digitization, each technological innovation has greatly expanded the transmission of human knowledge...
Comprehensive Introduction Ollama OCR is a powerful Optical Character Recognition (OCR) toolkit that utilizes the state-of-the-art visual language model provided by the Ollama platform to extract text from images. The project is available both as a Python package and provides a user-friendly Streamlit web application interface. It supports multiple ...
General Introduction STranslate is a ready-to-use translation and OCR tool developed by WPF. The tool is designed to provide efficient and convenient translation and Optical Character Recognition (OCR) functionality for a wide range of languages and text types.STranslate is an open source project that users are free to download and use,...
General Description VisionParser is an OCR (Optical Character Recognition) tool designed for processing receipts and invoices. Through advanced generative AI technology, VisionParser is able to quickly and accurately convert all kinds of receipts and invoices into structured data for a wide range of industries, such as retail, catering, B2B services...
Comprehensive Introduction Chunkr is a self-hosted API specialized in converting PDF, PPTX, DOCX, and Excel files into data suitable for use in RAG (Retrieval Augmented Generation) and LLM (Large Language Modeling). It was developed by Lumina AI Inc. and utilizes advanced visual models for document ingest...
General Introduction Llama OCR is an OCR (Optical Character Recognition) library based on Llama 3.2 Vision that converts documents to Markdown format. Developed by Nutlope, the library uses the free Llama 3.2 interface provided by Together AI to parse images and return Markdown...
Comprehensive Introduction Docling is a powerful document parsing and exporting tool that supports a wide range of document formats including PDF, DOCX, PPTX, XLSX, Image, HTML, AsciiDoc and Markdown.It parses and exports these documents to HTML, Markdown, and JSON formats, with support for embedding and...
Comprehensive Introduction ViTLP (Visually Guided Generative Text-Layout Pre-training for Document Intelligence) is an open source project that aims to enhance document intelligence processing through visually guided generative text layout pre-training models. The project was developed by Veason-silverbul...
General Introduction ScreenPipe is an AI assistant developed by mediar-ai that specializes in recording screen content, capturing screenshots and audio 24/7. It combines the technology of rewind.ai and cursor.com to store recorded data in a local database and supports Chinese ...
General Description Text Extraction API (text-extract-api) is a powerful tool designed to extract and parse content from a variety of document formats (e.g. PDF, Word, PPTX, etc.). The API utilizes state-of-the-art Optical Character Recognition (OCR) technology and Ollama-supported models to be able to take any document or image...
General Description Picture to Excel Free Tool is an efficient online tool that quickly and accurately recognizes and converts tabular data from pictures to Excel files. The tool supports a wide range of image formats, such as JPG and PNG, and can be used on web pages, iOS apps and Android apps. Through advanced AI technology...
Comprehensive Introduction Datalab offers a range of advanced AI models focused on OCR, layout analysis, PDF to Markdown, and more. These models are not only high performing, but also easy to use and open source. The Marker models on the platform can quickly and accurately convert PDF to Markdown, including tables...
General Introduction eSearch is an open source cross-platform screenshot tool developed by xushengfeng that supports Windows, macOS and Linux systems. eSearch integrates a variety of features including OCR recognition, search, translation, mapping, image search and screen recording. It integrates a variety of features, including screenshot, OCR recognition, search, translation, mapping, image search and screen recording. eSearch uses Electron box...
Comprehensive Introduction Surya is an open source OCR toolkit for multilingual documents that supports text recognition in more than 90 languages. It is capable of not only line-by-line text detection, but also layout analysis, reading order detection and table recognition.Surya's performance is comparable to cloud services for a wide range of document types, including p...
Comprehensive Introduction MinerU is an open source data extraction tool developed by the OpenDataLab team at the Shanghai Artificial Intelligence Laboratory, focusing on efficiently extracting content from complex PDF documents, web pages, and eBooks. It can convert multimodal PDF documents containing images, formulas, tables and other elements into easy-to-analyze M...
General Description PixPin is a powerful screenshot and posting tool designed to enhance users' productivity. Whether for daily office or professional needs, PixPin provides convenient screenshot, paste, long screenshot, text recognition (OCR) and dynamic screenshot functions. Its simple interface and rich features make...
Comprehensive Introduction GOT-OCR2.0 is a StepStar co-proposed de Open Source Optical Character Recognition (OCR) model, which aims to drive OCR technology towards OCR-2.0 through a unified end-to-end model. The model supports a wide range of OCR tasks, including normal text recognition, formatted text recognition, fine-grained OCR, multi...