This directive is derived from the Vision Parse project, which is divided into two steps to extract markdown documents.
Image Analysis Prompt (img_analysis.prompt).
Analyze this image and return a detailed JSON description including any text detected, images detected, tables detected, extracted text and confidence score for the extracted text. Confidence score for the extracted text should be a float value between 0 and 1. Confidence score for the extracted text should be a float value between 0 and 1. If you cannot determine certain details, leave those fields empty.
cue word translation
Analyzes this image and returns a detailed JSON description of any text detected, the image detected, the table detected, the extracted text and its confidence score. The confidence score for the extracted text should be a floating point value between 0 and 1. If some details cannot be determined, leave these fields blank.
Markdown formatting prompt word template (md_prompt.j2).
{% autoescape true %} Your task is to analyze the given image and extract textual content in markdown format. {% if confidence_score_text is defined and confidence_score_text is number %} {% if confidence_score_text > 0.6 %} - Verify if the extracted text matches with the content in the image: {{ extracted_text|escape|trim }}. - Ensure markdown text formatting for {{ extracted_text|escape|trim }} is applied properly by analyzing the image. - Ensure markdown text formatting for {{ extracted_text|escape|trim }} is applied properly by analyzing the image. Strictly do not change any content in the original extracted text while applying markdown text formatting. {Strictly do not change any content in the original extracted text while applying markdown text formatting.} - Please carefully reanalyze the text in the image as the initial confidence score was low. - Convert the provided image into markdown format and ensure that all content from the image is included. {Convert the provided image into markdown format and ensure that all content from the image is included.} {% endif %} {% if tables_detected is defined and tables_detected|string == "Yes" %} - Preserve the tabular structure in markdown format using | for columns and - for the header row separator. - Ensure that the cell values are properly aligned within the table columns and the tabular data is not distorted. - Ensure that the cell values are properly aligned within the table columns and the tabular data is not distorted. Maintain the original positioning of the table within the scanned document. {Do not include any additional explanations or comments.} - Preserve markdown text formatting if present such as bold, italics, underlines, headers, bullet points, links or other elements. - Strictly, do not omit any textual content from the given image and do not include any additional explanations, notes or comments. - Ensure that the content does not have unnecessary formatting and at the same time, preserve the original formatting as much as possible. - Strictly, do not generate code fences or backticks like ``` or ```markdown. {% endautoescape %}
cue word translation
{% autoescape true %} Your task is to analyze the given image and extract the text content in Markdown format. {% if confidence_score_text is defined and confidence_score_text is number %} {% if confidence_score_text > 0.6 %} - Verify that the extracted text matches the image content: {{ extracted_text|escape|trim }}. - Ensure that the Markdown text formatting of {{ extracted_text|escape|trim }} is correctly applied by analyzing the image. - Strictly do not change anything in the original extracted text when applying Markdown text formatting. {% else %} - Please re-analyze the text in the image carefully, as the initial confidence score is low. - Convert the supplied image to Markdown format and make sure that everything in the image is included. {% endif %} {% endif %} {% if tables_detected is defined and tables_detected|string == "Yes" %} - Use | for columns and - for header row separators to preserve the table structure in Markdown format. - Ensure that cell values are properly aligned in table columns and that table data is not distorted. - Keep the table in its original position in the scanned document. Do not include any additional notes or comments. {% endif %} - Preserve Markdown text formatting such as bolding, italics, underlining, headings, bullets, links or other elements if present. - Strictly do not omit any textual content in the given image and do not include any additional descriptions, notes or comments. - Make sure that the content is free of unnecessary formatting, while preserving the original formatting as much as possible. - Strictly do not generate code blocks or backquotes, such as ``` or ```markdown. {% endautoescape %}
Let me analyze all the variables referenced in the md_prompt.j2 template:
- confidence_score_text
{% if confidence_score_text is defined and confidence_score_text is number %}
{% if confidence_score_text > 0.6 %}
- Type: numeric (number)
- Purpose: store confidence scores for text extraction
- Value range: floating point number between 0 and 1
- Used to determine if text needs to be reanalyzed
- extracted_text
{{ extracted_text|escape|trim }}
- Type: string(string)
- Purpose: Stores the original text content extracted from the image
- Two filters are used:
- escape: escape special characters
- trim: Remove whitespace
- tables_detected
{% if tables_detected is defined and tables_detected|string == "Yes" %}
- Type: string(string)
- Purpose: Identifies whether a form is detected
- Possible values: "Yes" or "No"
These variables come from the ImageDescription model in the code (defined in llm.py):
class ImageDescription(BaseModel).
"""Model Schema for image description."""""
text_detected: Literal["Yes", "No"]
tables_detected: Literal["Yes", "No"]
tables_detected: Literal["Yes", "No"] tables_extracted_text: str
confidence_score_text: float
This model corresponds to the JSON structure returned by img_analysis.prompt, ensuring type safety and consistency of the data. These variables are generated during the image analysis phase (img_analysis.prompt) and then passed to the markdown generation template (md_prompt.j2) for use.