AI Personal Learning
and practical guidance

GOT-OCR2.0: end-to-end multimodal OCR model based on QWen2 0.5B

General Introduction

GOT-OCR2.0 is a StepStar co-presented de Open Source Optical Character Recognition (OCR) model, which aims to drive OCR technology towards OCR-2.0 through a unified end-to-end model. The model supports a wide range of OCR tasks, including plain text recognition, formatted text recognition, fine-grained OCR, multi-crop OCR, and multi-page OCR.GOT-OCR2.0 is designed with the goal of providing a versatile and efficient solution for a wide range of complex OCR application scenarios.

Based on QWen2 0.5 B model. Called OCR 2.0, the end-to-end OCR model with 580M parameters got a BLEU score of 0.972. Online experience at https://huggingface.co/spaces/ucaslcl/GOT_online


 

GOT-OCR2.0: end-to-end multimodal OCR model based on QWen2 0.5B

 

GOT-OCR2.0: end-to-end multimodal OCR model based on QWen2 0.5B

 

 

Function List

  • Ordinary Text Recognition: Recognize ordinary text content in images.
  • Formatted Text Recognition: Recognizes and retains formatting information of text, such as tables, paragraphs, etc.
  • Fine-grained OCR: Recognize fine text in images and text against complex backgrounds.
  • Multi-crop OCR: Supports multiple cropping of images and recognizes the text in each cropped area.
  • Multi-page OCR: Supports OCR of multi-page documents.

 

 

Using Help

Installation process

  1. Clone the project code:
    git clone https://github.com/Ucas-HaoranWei/GOT-OCR2.0.git
    cd GOT-OCR2.0
    
  2. Create and activate a virtual environment:
    conda create -n got python=3.10 -y
    conda activate got
    
  3. Install project dependencies:
    pip install -e .
    
  4. Install Flash-Attention:
    pip install ninja
    pip install flash-attn --no-build-isolation
    

Obtaining GOT model weights

Usage Process

  1. Prepare input data: Place the image or document to be OCR'd in the specified input directory.
  2. Run the OCR model:
    python3 GOT/demo/run_ocr_2.0.py --model-name /GOT_weights/ --image-file /an/image/file.png --type ocr
    
  3. View Output Results: The OCR processed text will be saved in the specified output directory, and users can further process it as needed.

Functional operation details

  • Plain text recognition: Recognizes and outputs ordinary text content in images as plain text files, suitable for simple text extraction tasks.
  • Formatted Text Recognition: Preserve formatting information, such as tables, paragraphs, etc., while recognizing text, for scenarios where the original formatting of the document needs to be preserved.
  • Fine-grained OCR: Recognize fine text in complex backgrounds, suitable for scenes requiring high-precision text extraction.
  • Multi-crop OCR: Crops the image multiple times and recognizes the text in each cropped region, suitable for scenarios that require multi-region recognition of images.
  • Multi-page OCR: Supports OCR of multi-page documents, suitable for scenarios where long documents or multi-page PDF files are processed.

With the above steps, users can easily install and use the GOT-OCR2.0 model for various OCR tasks. The model provides a rich set of functional modules that can meet the OCR needs in different scenarios.

May not be reproduced without permission:Chief AI Sharing Circle " GOT-OCR2.0: end-to-end multimodal OCR model based on QWen2 0.5B

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish