General Introduction
GOT-OCR2.0 is a StepStar co-presented de Open Source Optical Character Recognition (OCR) model, which aims to drive OCR technology towards OCR-2.0 through a unified end-to-end model. The model supports a wide range of OCR tasks, including plain text recognition, formatted text recognition, fine-grained OCR, multi-crop OCR, and multi-page OCR.GOT-OCR2.0 is designed with the goal of providing a versatile and efficient solution for a wide range of complex OCR application scenarios.
Based on QWen2 0.5 B model. Called OCR 2.0, the end-to-end OCR model with 580M parameters got a BLEU score of 0.972. Online experience at https://huggingface.co/spaces/ucaslcl/GOT_online
Function List
- Ordinary Text Recognition: Recognize ordinary text content in images.
- Formatted Text Recognition: Recognizes and retains formatting information of text, such as tables, paragraphs, etc.
- Fine-grained OCR: Recognize fine text in images and text against complex backgrounds.
- Multi-crop OCR: Supports multiple cropping of images and recognizes the text in each cropped area.
- Multi-page OCR: Supports OCR of multi-page documents.
Using Help
Installation process
- Clone the project code:
git clone https://github.com/Ucas-HaoranWei/GOT-OCR2.0.git cd GOT-OCR2.0
- Create and activate a virtual environment:
conda create -n got python=3.10 -y conda activate got
- Install project dependencies:
pip install -e .
- Install Flash-Attention:
pip install ninja pip install flash-attn --no-build-isolation
Obtaining GOT model weights
- Huggingface
- Google Drive
- Baidu cloud(Extraction code: OCR2)
Usage Process
- Prepare input data: Place the image or document to be OCR'd in the specified input directory.
- Run the OCR model:
python3 GOT/demo/run_ocr_2.0.py --model-name /GOT_weights/ --image-file /an/image/file.png --type ocr
- View Output Results: The OCR processed text will be saved in the specified output directory, and users can further process it as needed.
Functional operation details
- Plain text recognition: Recognizes and outputs ordinary text content in images as plain text files, suitable for simple text extraction tasks.
- Formatted Text Recognition: Preserve formatting information, such as tables, paragraphs, etc., while recognizing text, for scenarios where the original formatting of the document needs to be preserved.
- Fine-grained OCR: Recognize fine text in complex backgrounds, suitable for scenes requiring high-precision text extraction.
- Multi-crop OCR: Crops the image multiple times and recognizes the text in each cropped region, suitable for scenarios that require multi-region recognition of images.
- Multi-page OCR: Supports OCR of multi-page documents, suitable for scenarios where long documents or multi-page PDF files are processed.
With the above steps, users can easily install and use the GOT-OCR2.0 model for various OCR tasks. The model provides a rich set of functional modules that can meet the OCR needs in different scenarios.