AI Personal Learning
and practical guidance
讯飞绘镜

MegaParse: parses all types of documents into LLM-available data, preserving all information in the document such as tables, pictures, etc. in its entirety

General Introduction

MegaParse is a powerful and versatile document parsing tool designed to optimize data processing for the Large Language Model (LLM). Whether you are working with text, PDF, PowerPoint presentations or Word documents, MegaParse makes it easy and ensures that no information is lost in the parsing process. Developed by QuivrHQ, the tool is open source and free to use, and is designed to provide fast and efficient file parsing services for a wide range of file formats, including text, PDF, PowerPoint, Excel, CSV and Word documents.

MegaParse:解析各类型文档为LLM可用数据,完整保留文档中的表格、图片等所有信息-1


 

Function List

  • multifunctional parserSupport for multiple file types including text, PDF, PowerPoint, Excel, CSV and Word documents.
  • No information lost: Ensure that no information is lost in the parsing process.
  • fast and efficient: The design core focuses on speed and efficiency.
  • Open source and free: Open source project, free to use.
  • Support for multiple contents: Support for parsing tables, table of contents, headers, footers and images.

 

Three parsing modes.

  • UnstructuredParser
  • Visual parser (MegaParseVision) - supports multimodal models such as GPT-4V and Claude 3
  • LlamaParser - Enhanced parsing capabilities via Llama Cloud

Performance.
According to the benchmark test, the similarity ratio of MegaParseVision mode reaches 0.87, which is the best parsing mode in terms of performance.

Main application scenarios.

  • Need to import various documents into LLM system for processing
  • Scenarios that require document formatting and content integrity to be maintained
  • Batch document processing tasks

The project is under active development, with plans to add more features such as.

  • Improvements to the table inspector
  • Add modular post-processing
  • Add structured output support

 

Using Help

Installation process

  1. Installing MegaParse::
    pip install megaparse
    
  2. Configuring API Keys: Place your OpenAI or Anthropic The API key is added to the .env Documentation.
  3. Installation of dependencies::
    • For images and PDF files, install poppler cap (a poem) tesseractThe
    • If you are using a Mac, you will also need to install the libmagic::
      brew install libmagic
      

Using MegaParse

  1. Import MegaParse::
    from megaparse import MegaParse
    from langchain_openai import ChatOpenAI
    from megaparse.parser.unstructured_parser import UnstructuredParser
    parser = UnstructuredParser()
    megaparse = MegaParse(parser)
    response = megaparse.load("./test.pdf")
    print(response)
    megaparse.save("./test.md")
    
  2. Using MegaParse Vision::
    from megaparse import MegaParse
    from langchain_openai import ChatOpenAI
    from megaparse.parser.megaparse_vision import MegaParseVision
    model = ChatOpenAI(model="gpt-4o", api_key=os.getenv("OPENAI_API_KEY"))
    parser = MegaParseVision(model=model)
    megaparse = MegaParse(parser)
    response = megaparse.load("./test.pdf")
    print(response)
    megaparse.save("./test.md")
    

Boosting results with LlamaParse

  1. Create a Llama Cloud account and get an API keyThe
  2. Change parser to LlamaParser::
    from megaparse import MegaParse
    from langchain_openai import ChatOpenAI
    from megaparse.parser.llama_parser import LlamaParser
    parser = LlamaParser(api_key=os.getenv("LLAMA_CLOUD_API_KEY"))
    megaparse = MegaParse(parser)
    response = megaparse.load("./test.pdf")
    print(response)
    megaparse.save("./test.md")
    

Used as an API

  1. Using MakeFile::
    Run it in the project root directory:

    make dev
    
  2. Accessing Documents::
    Open your browser to access localhost:8000/docs View different endpoint information.
May not be reproduced without permission:Chief AI Sharing Circle " MegaParse: parses all types of documents into LLM-available data, preserving all information in the document such as tables, pictures, etc. in its entirety
en_USEnglish