AI Personal Learning
and practical guidance

OCR Prompt for Extracting Image Text Using Visual Models

In the face of complex text structures, or mixed-text content, it is good to utilize the visual model OCR capability to extract the content.

Multimodal macromodels or specialized visual models can understand the content of the image and receive instructions to perform recognition tasks, and we will use this capability to make the output match our requirements.


 

OCR Prompt is recommended to be tested in the following tool: ChatGPT , Kimi , Qwen2-VL(Currently the most accurate)

 

Test image:

The complexity of this image is the obscured json part, which is understood differently by different big models

 

Generally simple commands are fine:

Extracted in the original format

 

Only part of the content is extracted:

Extract only the table portion of the image

 

Extract and transcribe to fixed format text:

Recognize the images and organize them into MARKDOWN format tables, please keep the original order, format and language of the table

 

Structured Extraction:

Your task is to transcribe and format the contents of a document into markdown.Your goal is to create a well-structured, readable markdown document that accurately represents the original content while adding appropriate formatting and tags.

Follow the instructions below to complete the task:

1. read the entire document content carefully.

2. transcribe the content into markdown format, paying close attention to the existing format and structure.

3. if you find any unclear formatting in the original content, use your own judgment to add appropriate markdown formatting to improve readability and structure.

4. For tables, headings, and table of contents, add the following tags:
- Tables: Enclose the entire table in [TABLE] and [/TABLE] tags. If the table content continues on the next page, merge the table content.
- Headings (complete strings repeated at the beginning of each page): enclose in [HEADER] and [/HEADER] tags within the markdown file.
- Table of contents: enclosed in [TOC] and [/TOC] tags

5. When transcribing tables:
- If the table spans multiple pages, merge the content into one coherent table.
- Use proper markdown table formatting, with vertical lines (|) and hyphens (-) for table structure.

6. do not include page breaks in the transcription.

7. Maintain the logical flow and structure of the document, ensuring that sections and subsections are properly formatted using markdown headings (# for main headings, ## for subheadings, etc.).

8. use appropriate markdown syntax for other formatting elements such as bold, italics, lists, and code blocks as needed.

10. return only parsed content in markdown format, including tables, headings, and specified labels in the table of contents.

 

Extract and translate:

The translation commands I use most often are used here, and they also work wonders for OCR extraction of complex structured text:Translation of the "English instruction template" into "Chinese instructions", retaining the original formatting

AI Easy Learning

The layman's guide to getting started with AI

Help you learn how to utilize AI tools at a low cost and from a zero base.AI, like office software, is an essential skill for everyone. Mastering AI will give you an edge in your job search and half the effort in your future work and studies.

View Details>
May not be reproduced without permission:Chief AI Sharing Circle " OCR Prompt for Extracting Image Text Using Visual Models

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish