AI Personal Learning
and practical guidance

GOT-OCR2.0: end-to-end multimodal OCR model based on QWen2 0.5B

General Introduction

GOT-OCR2.0 is a StepStar co-presented de Open Source Optical Character Recognition (OCR) model, which aims to drive OCR technology towards OCR-2.0 through a unified end-to-end model. The model supports a wide range of OCR tasks, including plain text recognition, formatted text recognition, fine-grained OCR, multi-crop OCR, and multi-page OCR.GOT-OCR2.0 is designed with the goal of providing a versatile and efficient solution for a wide range of complex OCR application scenarios.

Based on QWen2 0.5 B model. Called OCR 2.0, the end-to-end OCR model with 580M parameters got a BLEU score of 0.972. Online experience at https://huggingface.co/spaces/ucaslcl/GOT_online


 

GOT-OCR2.0: end-to-end multimodal OCR model based on QWen2 0.5B

 

GOT-OCR2.0: end-to-end multimodal OCR model based on QWen2 0.5B

 

 

Function List

  • Ordinary Text Recognition: Recognize ordinary text content in images.
  • Formatted Text Recognition: Recognizes and retains formatting information of text, such as tables, paragraphs, etc.
  • Fine-grained OCR: Recognize fine text in images and text against complex backgrounds.
  • Multi-crop OCR: Supports multiple cropping of images and recognizes the text in each cropped area.
  • Multi-page OCR: Supports OCR of multi-page documents.

 

 

Using Help

Installation process

  1. Clone the project code:
    git clone https://github.com/Ucas-HaoranWei/GOT-OCR2.0.git
    cd GOT-OCR2.0
    
  2. Create and activate a virtual environment:
    conda create -n got python=3.10 -y
    conda activate got
    
  3. Install project dependencies:
    pip install -e .
    
  4. Install Flash-Attention:
    pip install ninja
    pip install flash-attn --no-build-isolation
    

Obtaining GOT model weights

Usage Process

  1. Prepare input data: Place the image or document to be OCR'd in the specified input directory.
  2. Run the OCR model:
    python3 GOT/demo/run_ocr_2.0.py --model-name /GOT_weights/ --image-file /an/image/file.png --type ocr
    
  3. View Output Results: The OCR processed text will be saved in the specified output directory, and users can further process it as needed.

Functional operation details

  • Plain text recognition: Recognizes and outputs ordinary text content in images as plain text files, suitable for simple text extraction tasks.
  • Formatted Text Recognition: Preserve formatting information, such as tables, paragraphs, etc., while recognizing text, for scenarios where the original formatting of the document needs to be preserved.
  • Fine-grained OCR: Recognize fine text in complex backgrounds, suitable for scenes requiring high-precision text extraction.
  • Multi-crop OCR: Crops the image multiple times and recognizes the text in each cropped region, suitable for scenarios that require multi-region recognition of images.
  • Multi-page OCR: Supports OCR of multi-page documents, suitable for scenarios where long documents or multi-page PDF files are processed.

With the above steps, users can easily install and use the GOT-OCR2.0 model for various OCR tasks. The model provides a rich set of functional modules that can meet the OCR needs in different scenarios.

AI Easy Learning

The layman's guide to getting started with AI

Help you learn how to utilize AI tools at a low cost and from a zero base.AI, like office software, is an essential skill for everyone. Mastering AI will give you an edge in your job search and half the effort in your future work and studies.

View Details>
May not be reproduced without permission:Chief AI Sharing Circle " GOT-OCR2.0: end-to-end multimodal OCR model based on QWen2 0.5B

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish