利用多模态大模型提取任意文档内表格为html格式文件

1.7K 00

仅提取文档内表格为html格式文件，文档内多表格、翻页表格均可正确提取，目前适配gemini-2.0-flash-exp效果较好。

原文

You are tasked with recognizing and extracting the contents of a table from an image, and then recreating the table's original structure using HTML tags. This task requires careful attention to detail and accurate reproduction of the table's layout.

Carefully analyze the image and identify the structure of the table, including the number of rows and columns, any merged cells, and the content of each cell.

Guidelines for extracting table content:
1. Identify all text within the table cells
2. Note any special formatting (e.g., bold text, different font sizes)
3. Pay attention to cell merging (both horizontal and vertical)
4. Observe any header rows or columns

Use the following HTML tags to recreate the table structure:
- <table> for the overall table
- <tr> for table rows
- <th> for header cells
- <td> for regular data cells
- Use the colspan attribute for cells that span multiple columns
- Use the rowspan attribute for cells that span multiple rows

Output the recreated table structure within a code block, using the ```html notation at the beginning and ``` at the end. Your output should look similar to this:

```html
<table>
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>Data 1</td>
<td>Data 2</td>
</tr>
</table>
```

Ensure that you maintain the original structure of the table, including any merged cells or special formatting. Be as accurate and complete as possible in your recreation.

After recreating the table, perform a final check to ensure that all content has been accurately extracted and that the HTML structure correctly represents the original table layout.

Remember: You should ignore graphic information. You can't output Base64.

Begin your analysis and recreation of the table now.

译文

你需要识别并提取图像中表格的内容，然后使用 HTML 标签重新创建表格的原始结构。此任务需要仔细关注细节并准确重现表格的布局。

仔细分析图像并识别表格的结构，包括行数和列数、任何合并的单元格以及每个单元格的内容。

提取表格内容的指导原则：
1. 识别所有表格单元格中的文本。
2. 注意任何特殊格式（例如，加粗文本、不同比例的字体）。
3. 关注单元格的合并情况（包括水平和垂直合并）。
4. 观察是否有标题行或标题列。

使用以下 HTML 标签来重建表格结构：
- 使用 <table> 表示整个表格。
- 使用 <tr> 表示表格行。
- 使用 <th> 表示标题单元格。
- 使用 <td> 表示普通数据单元格。
- 对于跨多列的单元格，使用 colspan 属性。
- 对于跨多行的单元格，使用 rowspan 属性。

在代码块中输出重建的表格结构，代码块开头使用 ```html 标注，结尾使用 ``` 关闭。输出格式应类似于以下内容：

```html
<table>
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>Data 1</td>
<td>Data 2</td>
</tr>
</table>
```

确保你保留表格的原始结构，包括任何合并单元格或特殊格式。尽可能准确和完整地进行重建。

在重建表格之后，进行最终检查以确保所有内容都已准确提取，且 HTML 结构能够正确表示原始表格布局。

注意：请忽略图形信息。不要输出 Base64。

现在开始分析和重建表格。

注意事项

提示词中删除句子： Remember: You should ignore graphic information. You can't output Base64. ，大模型有概率会复原图片中所有信息。