Comprehensive Introduction Outlines is an open source library developed by dottxt-ai to enhance the application of Large Language Models (LLMs) through structured text generation. The library supports a wide range of model integrations, including OpenAI, transformers, llama.cpp, etc. It provides simple but powerful cue primitives,...
General Introduction MarkItDown is a Python tool developed by Microsoft designed to convert various files and office documents to Markdown format. The tool supports a wide range of file types, including PDF, PowerPoint, Word, Excel, images (EXIF metadata and OCR), audio (EXIF metadata and language...
Enable Builder Smart Programming Mode, unlimited use of DeepSeek-R1 and DeepSeek-V3, smoother experience than the overseas version. Just enter the Chinese commands, even a novice programmer can write his own apps with zero threshold.
Comprehensive Introduction Chunkr is a self-hosted API specialized in converting PDF, PPTX, DOCX, and Excel files into data suitable for use in RAG (Retrieval Augmented Generation) and LLM (Large Language Modeling). It was developed by Lumina AI Inc. and utilizes advanced visual models for document ingest...
General Introduction GitIngest is an open source tool designed to transform GitHub code repositories into text suitable for Large Language Model (LLM) hints. With a simple operation, users can extract and format the content of any GitHub repository into text suitable for LLM use. The tool provides one-click analysis...
General Introduction E2M (Everything to Markdown) is an open source Python library designed to convert multiple file formats to Markdown format. The tool supports a wide range of file types including doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3 and m4a.E2M uses...
Comprehensive Introduction Docling is a powerful document parsing and exporting tool that supports a wide range of document formats including PDF, DOCX, PPTX, XLSX, Image, HTML, AsciiDoc and Markdown.It parses and exports these documents to HTML, Markdown, and JSON formats, with support for embedding and...
Comprehensive Introduction MegaParse is a powerful and versatile document parsing tool designed to optimize data processing for the Large Language Model (LLM). Whether you are working with text, PDF, PowerPoint presentations or Word documents, MegaParse makes it easy and ensures that the parsing process is not...
Comprehensive Introduction ViTLP (Visually Guided Generative Text-Layout Pre-training for Document Intelligence) is an open source project that aims to enhance document intelligence processing through visually guided generative text layout pre-training models. The project was developed by Veason-silverbul...
General Introduction Trieve is an all-inclusive infrastructure developed by Devflow, Inc. designed for search, recommendations, RAG (retrieval augmentation generation) and analytics. The platform is served via an API, supports self-hosting, and is available for environments such as AWS, GCP, Kubernetes, and Docker Compose....
Comprehensive introduction pdf2htmlEX is an open source tool designed to convert PDF files to HTML format , by analyzing the content of PDF files and use HTML + CSS to accurately restore its visual effect , PDF documents into a browser can be viewed directly on the web page . The tool is particularly suitable for containing a large number of ...
Comprehensive Introduction Maxun is an open source no-code web data extraction platform that allows users to train robots in minutes to automatically crawl web data and convert it into APIs or spreadsheets. The platform supports paging and scrolling, can adapt to changes in website layout, provides powerful data crawling features for...
Comprehensive Introduction OmniParse is a powerful data parsing and optimization platform designed to transform any unstructured data into structured, actionable data optimized for the GenAI (Generative Artificial Intelligence) framework. Whether you are working with documents, tables, images, videos, audio files or web content,...
General Introduction Parsio is an AI-based document and email data extraction tool that automatically extracts structured data from PDFs, emails and other documents. The platform provides a powerful PDF parser and OCR functionality, and supports a wide range of document types, including invoices, business cards and IDs...
Comprehensive Introduction Chonkie is a lightweight and efficient RAG (Retrieval-Augmented Generation) text chunking library designed to help developers quickly and easily chunk text. The library supports a variety of chunking methods, including chunking based on tags, words, sentences and semantic similarity...
Comprehensive introduction TextIn is a professional PDF to Markdown tool designed to help users efficiently convert PDF documents to Markdown format. The tool supports a variety of file formats, easy to operate, fast conversion speed, the ability to retain the original PDF format and content, to enhance the efficiency of document processing. Whether it is a ...
General Description Text Extraction API (text-extract-api) is a powerful tool designed to extract and parse content from a variety of document formats (e.g. PDF, Word, PPTX, etc.). The API utilizes state-of-the-art Optical Character Recognition (OCR) technology and Ollama-supported models to be able to take any document or image...
Comprehensive Introduction Datalab offers a range of advanced AI models focused on OCR, layout analysis, PDF to Markdown, and more. These models are not only high performing, but also easy to use and open source. The Marker models on the platform can quickly and accurately convert PDF to Markdown, including tables...
Comprehensive Introduction MinerU is an open source data extraction tool developed by the OpenDataLab team at the Shanghai Artificial Intelligence Laboratory, focusing on efficiently extracting content from complex PDF documents, web pages, and eBooks. It can convert multimodal PDF documents containing images, formulas, tables and other elements into easy-to-analyze M...
General Introduction Marker is a deep learning based document processing tool designed to convert PDF files to Markdown format quickly and accurately. It supports a wide range of document types and is especially optimized for conversion of books and scientific papers.Marker is able to remove redundant content such as headers and footers, format tables and...