AI Personal Learning
and practical guidance

Combined cue word commands by visually extracting documents as Markdown formatted documents

This directive is derived from the Vision Parse project, which is divided into two steps to extract markdown documents.

Image Analysis Prompt (img_analysis.prompt).

Analyze this image and return a detailed JSON description including any text detected, images detected, tables detected, extracted text and confidence score for the extracted text. Confidence score for the extracted text should be a float value between 0 and 1.
Confidence score for the extracted text should be a float value between 0 and 1. If you cannot determine certain details, leave those fields empty.

 


cue word translation

Analyzes this image and returns a detailed JSON description of any text detected, the image detected, the table detected, the extracted text and its confidence score.
The confidence score for the extracted text should be a floating point value between 0 and 1. If some details cannot be determined, leave these fields blank.

 

Markdown formatting prompt word template (md_prompt.j2).

{% autoescape true %}

Your task is to analyze the given image and extract textual content in markdown format.

{% if confidence_score_text is defined and confidence_score_text is number %}
{% if confidence_score_text > 0.6 %}
- Verify if the extracted text matches with the content in the image: {{ extracted_text|escape|trim }}.
- Ensure markdown text formatting for {{ extracted_text|escape|trim }} is applied properly by analyzing the image.
- Ensure markdown text formatting for {{ extracted_text|escape|trim }} is applied properly by analyzing the image. Strictly do not change any content in the original extracted text while applying markdown text formatting.
{Strictly do not change any content in the original extracted text while applying markdown text formatting.}
- Please carefully reanalyze the text in the image as the initial confidence score was low.
- Convert the provided image into markdown format and ensure that all content from the image is included.
{Convert the provided image into markdown format and ensure that all content from the image is included.}
{% endif %}

{% if tables_detected is defined and tables_detected|string == "Yes" %}
- Preserve the tabular structure in markdown format using | for columns and - for the header row separator.
- Ensure that the cell values are properly aligned within the table columns and the tabular data is not distorted.
- Ensure that the cell values are properly aligned within the table columns and the tabular data is not distorted. Maintain the original positioning of the table within the scanned document.
{Do not include any additional explanations or comments.}

- Preserve markdown text formatting if present such as bold, italics, underlines, headers, bullet points, links or other elements.
- Strictly, do not omit any textual content from the given image and do not include any additional explanations, notes or comments.
- Ensure that the content does not have unnecessary formatting and at the same time, preserve the original formatting as much as possible.
- Strictly, do not generate code fences or backticks like ``` or ```markdown.

{% endautoescape %}

 

cue word translation

{% autoescape true %}

Your task is to analyze the given image and extract the text content in Markdown format.

{% if confidence_score_text is defined and confidence_score_text is number %}
{% if confidence_score_text > 0.6 %}
- Verify that the extracted text matches the image content: {{ extracted_text|escape|trim }}.
- Ensure that the Markdown text formatting of {{ extracted_text|escape|trim }} is correctly applied by analyzing the image.
- Strictly do not change anything in the original extracted text when applying Markdown text formatting.
{% else %}
- Please re-analyze the text in the image carefully, as the initial confidence score is low.
- Convert the supplied image to Markdown format and make sure that everything in the image is included.
{% endif %}
{% endif %}

{% if tables_detected is defined and tables_detected|string == "Yes" %}
- Use | for columns and - for header row separators to preserve the table structure in Markdown format.
- Ensure that cell values are properly aligned in table columns and that table data is not distorted.
- Keep the table in its original position in the scanned document. Do not include any additional notes or comments.
{% endif %}

- Preserve Markdown text formatting such as bolding, italics, underlining, headings, bullets, links or other elements if present.
- Strictly do not omit any textual content in the given image and do not include any additional descriptions, notes or comments.
- Make sure that the content is free of unnecessary formatting, while preserving the original formatting as much as possible.
- Strictly do not generate code blocks or backquotes, such as ``` or ```markdown.

{% endautoescape %}

 

Let me analyze all the variables referenced in the md_prompt.j2 template:

  1. confidence_score_text
{% if confidence_score_text is defined and confidence_score_text is number %}
{% if confidence_score_text > 0.6 %}
  • Type: numeric (number)
  • Purpose: store confidence scores for text extraction
  • Value range: floating point number between 0 and 1
  • Used to determine if text needs to be reanalyzed
  1. extracted_text
{{ extracted_text|escape|trim }}
  • Type: string(string)
  • Purpose: Stores the original text content extracted from the image
  • Two filters are used:
    • escape: escape special characters
    • trim: Remove whitespace
  1. tables_detected
{% if tables_detected is defined and tables_detected|string == "Yes" %}
  • Type: string(string)
  • Purpose: Identifies whether a form is detected
  • Possible values: "Yes" or "No"

These variables come from the ImageDescription model in the code (defined in llm.py):

class ImageDescription(BaseModel).
"""Model Schema for image description."""""
text_detected: Literal["Yes", "No"]
tables_detected: Literal["Yes", "No"]
tables_detected: Literal["Yes", "No"] tables_extracted_text: str
confidence_score_text: float

This model corresponds to the JSON structure returned by img_analysis.prompt, ensuring type safety and consistency of the data. These variables are generated during the image analysis phase (img_analysis.prompt) and then passed to the markdown generation template (md_prompt.j2) for use.

May not be reproduced without permission:Chief AI Sharing Circle " Combined cue word commands by visually extracting documents as Markdown formatted documents

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish