General Introduction
Datalab offers a range of advanced AI models focused on OCR, layout analysis, PDF to Markdown, and more. These models are not only high performing, but also easy to use and open source. The Marker model on the platform can quickly and accurately convert PDF to Markdown, including tables and formulas.The Surya model supports OCR in more than 90 languages, detecting lines of text in a variety of languages and recognizing layout blocks such as headings, images, and formulas in a document.The Texify model converts formulas recognized by OCR to LaTeX format. Users can safely use these tools in their own environment.
Tabled Data open source parsing project marker and surya open-source authors for detecting and extracting forms.
Function List
- Marker: Convert PDF to Markdown quickly and accurately, including tables and formulas.
- Surya: OCR support for more than 90 languages, detecting lines of text and recognizing document layout blocks.
- Texify: Convert OCR-recognized formulas to LaTeX format.
- Safe use:: Users can use these tools safely in their own environments.
Using Help
Marker
- mounting: Download and install the relevant dependencies for the Marker model.
- utilization: Upload PDF files to Marker, click the Convert button and wait for a few seconds to get the files in Markdown format.
- caveat: Ensure PDF files are clear to improve conversion accuracy.
Surya
- mounting: Download and install the relevant dependencies for the Surya model.
- utilization: Upload the document to be OCR'd, select the language, click the Start button and wait for the OCR result.
- functionality: Support for multi-language OCR, text line detection, document layout recognition.
- caveat:: For complex documents, segmented processing is recommended to improve recognition accuracy.
Texify
- mounting: Download and install the relevant dependencies for the Texify model.
- utilization: Upload a document containing formulas, click on the Convert button and wait a few seconds to get the formulas in LaTeX format.
- caveat:: Ensure formulas are clear to improve conversion accuracy.