TF-ID: Academic Paper Form/Image Recognition Tool

🚀 Invitation to Experience: China's First AI IDE Intelligent Programming Software Trae Chinese version downloadThe DeepSeek-R1 and Doubao-pro are available for unlimited use!

General Introduction

TF-ID (Table/Figure IDentifier) is a family of object detection models specialized for extracting tables and images from academic papers. The project was created by Yifei Hu and open-sourced on GitHub.TF-ID models are fine-tuned to recognize and extract tables and images from academic papers, supporting extraction with or without caption text. The project provides complete training code, model weights and manually labeled datasets, all open-sourced under the MIT license.

TF-ID: Academic Paper Forms/Image Recognition Tool-1

Function List

Extract tables and images from academic papers
Supports extraction with or without header text
Provide complete training code and model weights
Support extracting tables and images from PDF files
Multiple model versions available to suit different needs

Using Help

Installation process

Cloning Warehouse:

git clone https://github.com/ai8hyf/TF-ID
cd TF-ID

Download the dataset: Download the dataset from Hugging Face and extract it to the appropriate directory.

wget https://huggingface.co/datasets/yifeihu/TF-ID-arxiv-papers/resolve/main/arxiv_paper_images.zip
unzip arxiv_paper_images.zip -d ./images

Convert the dataset format:
```
python coco_to_florence.py
```
Training models:
```
accelerate launch train.py
```

Usage Process

Extracts tables and images from a single image:

python inference.py --image_path path/to/image.png

Extract all tables and images from PDF files:

python pdf_to_table_figures.py --pdf_path path/to/paper.pdf --output_dir ./sample_output

Detailed Operation Procedure

Extract tables and images from a single image::
- Passes the image path to theinference.pyscript, which will use the default TF-ID-large model to extract the tables and images in the image.
- The extraction results will be returned as a bounding box identifying the table and image position in the image.
Extract all tables and images from PDF files::
- Pass the PDF file path to thepdf_to_table_figures.pyscript, which will extract all tables and images from the PDF file and save the cropped images to the specified output directory.
- By default, the TF-ID-large model is used for extraction, which can be changed by modifying the script'smodel_idparameter to switch to another model version.
training model::
- After cloning the repository and downloading the dataset, use thecoco_to_florence.pyThe script converts the dataset to Florence 2 format.
- utilizationaccelerate launch train.pycommand initiates model training, and the checkpoint file is saved during training.

TF-ID: academic paper form/image recognition tool

General Introduction

Function List

Using Help

Installation process

Usage Process

Detailed Operation Procedure

Related articles

Recommended

Can't find AI tools? Try here!

FLUX.1 image generator (supports Chinese input)

Recent AI Hotspots

AI Tools Recommendations

AI Tools Classification