Combined cue word commands by visually extracting documents as Markdown formatted documents

AI utility commands7mos agoupdate AI Sharing Circle

1.4K 00

This directive is derived from the Vision Parse project, which is divided into two steps to extract markdown documents.

Image Analysis Prompt (img_analysis.prompt).

Analyze this image and return a detailed JSON description including any text detected, images detected, tables detected, extracted text and confidence score for the extracted text.
Confidence score for the extracted text should be a float value between 0 and 1. If you cannot determine certain details, leave those fields empty.

cue word translation

分析此图像并返回一个详细的 JSON 描述，其中包括检测到的任何文本、检测到的图像、检测到的表格、提取的文本及其置信度分数。 
提取文本的置信度分数应为介于 0 和 1 之间的浮点值。如果无法确定某些细节，请将这些字段留空。

Markdown formatting prompt word template (md_prompt.j2).

{% autoescape true %}

Your task is to analyze the given image and extract textual content in markdown format.

{% if confidence_score_text is defined and confidence_score_text is number %}
{% if confidence_score_text > 0.6 %}
- Verify if the extracted text matches with the content in the image: {{ extracted_text|escape|trim }}.
- Ensure markdown text formatting for {{ extracted_text|escape|trim }} is applied properly by analyzing the image.
- Strictly do not change any content in the original extracted text while applying markdown text formatting.
{% else %}
- Please carefully reanalyze the text in the image as the initial confidence score was low.
- Convert the provided image into markdown format and ensure that all content from the image is included.
{% endif %}
{% endif %}

{% if tables_detected is defined and tables_detected|string == "Yes" %}
- Preserve the tabular structure in markdown format using | for columns and - for the header row separator.
- Ensure that the cell values are properly aligned within the table columns and the tabular data is not distorted.
- Maintain the original positioning of the table within the scanned document. Do not include any additional explanations or comments.
{% endif %}

- Preserve markdown text formatting if present such as bold, italics, underlines, headers, bullet points, links or other elements.
- Strictly, do not omit any textual content from the given image and do not include any additional explanations, notes or comments.
- Ensure that the content does not have unnecessary formatting and at the same time, preserve the original formatting as much as possible.
- Strictly, do not generate code fences or backticks like ``` or ```markdown.

{% endautoescape %}

cue word translation

{% autoescape true %}

您的任务是分析给定的图像，并以 Markdown 格式提取文本内容。

{% if confidence_score_text is defined and confidence_score_text is number %}
{% if confidence_score_text > 0.6 %}
- 验证提取的文本是否与图像内容匹配：{{ extracted_text|escape|trim }}。
- 通过分析图像，确保 {{ extracted_text|escape|trim }} 的 Markdown 文本格式正确应用。
- 在应用 Markdown 文本格式时，严格不要更改原始提取文本中的任何内容。
{% else %}
- 请仔细重新分析图像中的文本，因为初始置信评分较低。
- 将提供的图像转换为 Markdown 格式，并确保图像中的所有内容都被包含。
{% endif %}
{% endif %}

{% if tables_detected is defined and tables_detected|string == "Yes" %}
- 使用 | 表示列，使用 - 表示表头行分隔符，以 Markdown 格式保留表格结构。
- 确保单元格值在表格列中正确对齐，并且表格数据未发生扭曲。
- 保持表格在扫描文档中的原始位置。不要包括任何额外的说明或评论。
{% endif %}

- 如果存在 Markdown 文本格式，例如加粗、斜体、下划线、标题、项目符号、链接或其他元素，请予以保留。
- 严格不要遗漏给定图像中的任何文本内容，也不要包括任何额外的说明、备注或评论。
- 确保内容没有不必要的格式，同时尽可能保留原始格式。
- 严格不要生成代码块或反引号，例如 ``` 或 ```markdown。

{% endautoescape %}

Let me analyze all the variables referenced in the md_prompt.j2 template:

confidence_score_text

{% if confidence_score_text is defined and confidence_score_text is number %}
{% if confidence_score_text > 0.6 %}

Type: numeric (number)
Purpose: store confidence scores for text extraction
Value range: floating point number between 0 and 1
Used to determine if text needs to be reanalyzed

extracted_text

{{ extracted_text|escape|trim }}

Type: string(string)
Purpose: Stores the original text content extracted from the image
Two filters are used:
- escape: escape special characters
- trim: Remove whitespace

tables_detected

{% if tables_detected is defined and tables_detected|string == "Yes" %}

Type: string(string)
Purpose: Identifies whether a form is detected
Possible values: "Yes" or "No"

These variables come from the ImageDescription model in the code (defined in llm.py):

class ImageDescription(BaseModel):
"""Model Schema for image description."""
text_detected: Literal["Yes", "No"]
tables_detected: Literal["Yes", "No"]
extracted_text: str
confidence_score_text: float

This model corresponds to the JSON structure returned by img_analysis.prompt, ensuring type safety and consistency of the data. These variables are generated during the image analysis phase (img_analysis.prompt) and then passed to the markdown generation template (md_prompt.j2) for use.