AI Personal Learning
and practical guidance
CyberKnife Drawing Mirror

BabelDOC: PDF documents will be translated into bilingual open source tools

General Introduction

BabelDOC is an open source tool designed to translate PDF documents into a bilingual format. Developed by the funstory-ai team and hosted on GitHub, it mainly serves users who need to work with foreign language documents, such as researchers, students, and technicians.BabelDOC supports the translation of English PDFs into Chinese, preserving the original layout, including mathematical formulas and tables. Users can use it through an online service, command line instrumentation (CLI), or the Python API. The online service is provided by Immersive Translate and is free of charge for 1000 pages per month, while self-deployment is available via source code or the PDFMathTranslate Project realization.

BabelDOC: PDF documents will be translated into bilingual open source tools-1


 

Function List

  • Translate PDF files into a bilingual format with the original and translated text displayed side-by-side.
  • Preserves original typography and supports the correct presentation of math formulas, tables, and images.
  • Provides command line tools (CLI) to support batch translation of multiple files.
  • Provides Python API for developers to embed into other programs.
  • Support for online services, free translation of 1000 pages per month.
  • Self-deployment is supported, with the option to run locally or use additional translation services in conjunction with PDFMathTranslate.
  • Configurable with multiple translation engines, such as OpenAI, Bing, etc.
  • Provide offline resource package management, suitable for no network environment.

 

Using Help

BabelDOC can be used in a variety of ways, including online services and local deployment. Below is a detailed guide to help you get started quickly.

Use of online services

  1. Access to online services
    show (a ticket) Immersive Translate - BabelDOCThis is a Beta service.
  2. Uploading files
    Click the Upload button and select the PDF file to be translated. The file size and number of pages must be within the free quota (1000 pages/month).
  3. Select Language
    English to Chinese translation is supported by default. After uploading, the system will automatically process and generate a bilingual PDF.
  4. Download results
    Once the translation is complete, click the Download button to get the translated file. The result will retain the original text and the translation displayed side by side.

Local Installation Process

BabelDOC supports installation from PyPI or from source, and is recommended to use the uv Managing the environment.

Installing from PyPI

  1. Install Python and uv
    Make sure your system has Python 3.12 or later. Download and install uv and configure environment variables.
  2. Installing BabelDOC
    Runs in the terminal:
uv tool install --python 3.12 BabelDOC
  1. Verify Installation
    Input:
babeldoc --help

If a help message is displayed, the installation was successful.

Installation from source

  1. cloning project
    Runs in the terminal:
git clone https://github.com/funstory-ai/BabelDOC
cd BabelDOC
  1. Installation of dependencies
    utilization uv Install the dependencies:
uv run pip install -r requirements.txt
  1. Verify Installation
    Running:
uv run babeldoc --help

Seeing the help message indicates success.

Offline resourcing

If you need to use it offline, you can manage the resource pack:

  1. Generate Resource Kit
babeldoc --generate-offline-assets /path/to/output/dir

The generated zip file contains fonts and models.
2. Recovery resource kit

babeldoc --restore-offline-assets /path/to/offline_assets_package.zip

Resources will be extracted to the default path ~/.cache/babeldoc/assets/The

Local usage

Command Line Usage

  1. Translation of individual documents
    Suppose there is a file example.pdfTranslated with OpenAI:
babeldoc --files example.pdf --openai --openai-model "gpt-4o-mini" --openai-api-key "你的API密钥"

The output file is automatically generated as example_translated.pdfThe
2. Translation of multiple documents
Handles multiple files:

babeldoc --files example1.pdf --files example2.pdf --bing
  1. Specify page
    Translate pages 1, 3-5 only:
babeldoc --files example.pdf --pages "1,3-5" --openai --openai-api-key "你的API密钥"
  1. Adjustment of language
    Default English to Chinese translation, if other languages are required:
babeldoc --files example.pdf --lang-in "en" --lang-out "fr" --openai --openai-api-key "你的API密钥"

Python API Usage

  1. basic translation
    Creating Documents translate.py::
from babeldoc.main import TranslationConfig, translate_document
config = TranslationConfig(
files=["example.pdf"],
lang_in="en",
lang_out="zh",
translator="openai",
openai_api_key="你的API密钥",
openai_model="gpt-4o-mini"
)
translate_document(config)

(of a computer) run python translate.py Ready to translate.
2. Offline resource management

from pathlib import Path
from babeldoc.assets.assets import generate_offline_assets_package, restore_offline_assets_package
# 生成资源包
generate_offline_assets_package(Path("/path/to/output/dir"))
# 恢复资源包
restore_offline_assets_package(Path("/path/to/offline_assets_package.zip"))

Featured Function Operation

  1. bilingualism
    The translated PDF places the original text and the translated text side by side. For example, English is on the left and Chinese is on the right for easy comparison. Formulas such as E=mc^2 The original text will be retained, with explanations of the translation beside it.
  2. Complex Documentation Support
    Supports recognition and retention of mathematical formulas and tables. Translation is done without destroying the original structure.
  3. Flexible deployment
    Online services are simple and fast, and self-deployment supports more customization options, such as translation engine selection.

Self-deployment enhancement

If you need more features, you can use PDFMathTranslate:

  1. Install PDFMathTranslate
    Follow its GitHub instructions to install it and support WebUI and more translation services.
  2. Combined with BabelDOC
    PDFMathTranslate version 1.9.3+ experimental support for BabelDOC for enhanced translation capabilities.

caveat

  • The online service is free of charge for 1,000 pages per month, beyond which you will have to pay.
  • For local deployment, you need to configure an API key, such as the OpenAI key, which can be obtained from the OpenAI website.
  • Currently, it mainly optimizes English to Chinese translation, with limited support for other languages.
  • Large files are recommended to use --max-pages-per-part Segmentation.

 

application scenario

  1. academic research
    The researcher was given the English paper and translated into a bilingual version using BabelDOC for easy reading and understanding.
  2. Technical Documentation Translation
    The developer translates the English technical manuals, keeping the code and diagrams for direct use in the work.
  3. Educational learning
    Students use it to translate foreign language textbooks, learning the language and mastering the content at the same time.

 

QA

  1. What file formats are supported?
    Only PDF is supported, other formats need to be converted to PDF first.
  2. What is the difference between online services and local deployment?
    The online service is simple, with 1,000 free pages per month; local deployment requires installation, but can be customized with more options.
  3. What if the translation is not accurate?
    Try switching translation engines (e.g. from Bing to OpenAI) or adjusting the model.
  4. How do I contribute code?
    ferret out CONTRIBUTINGThe Immersive Translate membership is available to active contributors.
May not be reproduced without permission:Chief AI Sharing Circle " BabelDOC: PDF documents will be translated into bilingual open source tools
en_USEnglish