In the face of complex text structures, or mixed-text content, it is good to utilize the visual model OCR capability to extract the content.
Multimodal macromodels or specialized visual models can understand the content of the image and receive instructions to perform recognition tasks, and we will use this capability to make the output match our requirements.
OCR Prompt is recommended to be tested in the following tool: ChatGPT , Kimi , Qwen2-VL(Currently the most accurate)
Test image:
The complexity of this image is the obscured json part, which is understood differently by different big models
Generally simple commands are fine:
Extracted in the original format
Only part of the content is extracted:
Extract only the table portion of the image
Extract and transcribe to fixed format text:
Recognize the images and organize them into MARKDOWN format tables, please keep the original order, format and language of the table
Structured Extraction:
Your task is to transcribe and format the contents of a document into markdown.Your goal is to create a well-structured, readable markdown document that accurately represents the original content while adding appropriate formatting and tags. Follow the instructions below to complete the task: 1. read the entire document content carefully. 2. transcribe the content into markdown format, paying close attention to the existing format and structure. 3. if you find any unclear formatting in the original content, use your own judgment to add appropriate markdown formatting to improve readability and structure. 4. For tables, headings, and table of contents, add the following tags: - Tables: Enclose the entire table in [TABLE] and [/TABLE] tags. If the table content continues on the next page, merge the table content. - Headings (complete strings repeated at the beginning of each page): enclose in [HEADER] and [/HEADER] tags within the markdown file. - Table of contents: enclosed in [TOC] and [/TOC] tags 5. When transcribing tables: - If the table spans multiple pages, merge the content into one coherent table. - Use proper markdown table formatting, with vertical lines (|) and hyphens (-) for table structure. 6. do not include page breaks in the transcription. 7. Maintain the logical flow and structure of the document, ensuring that sections and subsections are properly formatted using markdown headings (# for main headings, ## for subheadings, etc.). 8. use appropriate markdown syntax for other formatting elements such as bold, italics, lists, and code blocks as needed. 10. return only parsed content in markdown format, including tables, headings, and specified labels in the table of contents.
Extract and translate:
The translation commands I use most often are used here, and they also work wonders for OCR extraction of complex structured text:Translation of the "English instruction template" into "Chinese instructions", retaining the original formatting