AI Personal Learning
and practical guidance

TF-ID: academic paper form/image recognition tool

General Introduction

TF-ID (Table/Figure IDentifier) is a family of object detection models specialized for extracting tables and images from academic papers. The project was created by Yifei Hu and open-sourced on GitHub.TF-ID models are fine-tuned to recognize and extract tables and images from academic papers, supporting extraction with or without caption text. The project provides complete training code, model weights and manually labeled datasets, all open-sourced under the MIT license.

 


TF-ID: Academic Paper Forms/Image Recognition Tool-1

 

 

Function List

  • Extract tables and images from academic papers
  • Supports extraction with or without header text
  • Provide complete training code and model weights
  • Support extracting tables and images from PDF files
  • Multiple model versions available to suit different needs

 

 

Using Help

Installation process

  1. Cloning Warehouse:
    git clone https://github.com/ai8hyf/TF-ID
    cd TF-ID
    
  2. Download the dataset: Download the dataset from Hugging Face and extract it to the appropriate directory.
    wget https://huggingface.co/datasets/yifeihu/TF-ID-arxiv-papers/resolve/main/arxiv_paper_images.zip
    unzip arxiv_paper_images.zip -d . /images
    
  3. Convert the dataset format:
    python coco_to_florence.py
    
  4. Training models:
    accelerate launch train.py
    

Usage Process

  1. Extracts tables and images from a single image:
    python inference.py --image_path path/to/image.png
    
  2. Extract all tables and images from PDF files:
    python pdf_to_table_figures.py --pdf_path path/to/paper.pdf --output_dir . /sample_output
    

Detailed Operation Procedure

  1. Extract tables and images from a single image::
    • Passes the image path to theinference.pyscript, which will use the default TF-ID-large model to extract the tables and images in the image.
    • The extraction results will be returned as a bounding box identifying the table and image position in the image.
  2. Extract all tables and images from PDF files::
    • Pass the PDF file path to thepdf_to_table_figures.pyscript, which will extract all tables and images from the PDF file and save the cropped images to the specified output directory.
    • By default, the TF-ID-large model is used for extraction, which can be changed by modifying the script'smodel_idparameter to switch to another model version.
  3. training model::
    • After cloning the repository and downloading the dataset, use thecoco_to_florence.pyThe script converts the dataset to Florence 2 format.
    • utilizationaccelerate launch train.pycommand initiates model training, and the checkpoint file is saved during training.
May not be reproduced without permission:Chief AI Sharing Circle " TF-ID: academic paper form/image recognition tool

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish