SmartRead: Automatically annotate technical PDF documents and provide relevant citation sources

Latest AI Resources5mos agorelease AI Sharing Circle

1.3K 00

General Introduction

SmartRead is an AI-based open source tool designed for technical documents. It can automatically analyze PDF files and mark key content, such as important terms, titles or core ideas, to help users quickly understand complex documents. At the same time, it can also provide articles and video links related to the subject of the document, so that learning is more comprehensive. This project by the developer Dev-Khant released on GitHub, the code is completely open, free to use. smartRead suitable for technicians, students or researchers, especially for people who need to efficiently read technical PDF is very useful.

Function List

Automatically add annotations to technical PDFs to highlight key content, such as terms, headings, or highlighted passages.
Recommend relevant articles and videos based on the content of the document to enhance the depth of understanding.
Supports processing of a wide range of technical PDF files, making complex documents easier to read.
Allows downloading of annotated PDF files with all highlighting and annotations preserved.
Open source design allows users to view code, modify features or submit suggestions for improvement.

Using Help

SmartRead is an open source project hosted on GitHub that users need to install and configure to use. Below are detailed steps to help you go from download to getting up and running.

Installation process

SmartRead runs in two parts, front-end and back-end, and needs to be prepared for the development environment. The following are the specific installation steps:

Preparing the environment

Installation of basic tools
- Download Git (git-scm.com) for cloning code.
- Install Node.js (version 18+.nodejs.org) for the front end.
- Install Python (version 3.12.python.org) for back-end local development.
- Install Docker (docker.com) for back-end containerized runs.

Download Code

Open a terminal and run it:

git clone https://github.com/Dev-Khant/smartread.git
cd smartread

Configuring Environment Variables

Copy the example file:

cp backend/.env.example backend/.env
cp web/.env.example web/.env.local

compiler backend/.env, fill in the following (you need to get the key by yourself):

PORT=8000
HOST=0.0.0.0
ENVIRONMENT=development
MONGODB_URL=mongodb://你的MongoDB地址
MISTRAL_API_KEY=你的Mistral密钥
GROQ_API_KEY=你的Groq密钥
CLOUDINARY_CLOUD_NAME=你的Cloudinary名称
CLOUDINARY_API_KEY=你的Cloudinary密钥
CLOUDINARY_API_SECRET=你的Cloudinary秘钥

compiler web/.env.local::

NEXT_PUBLIC_BACKEND_API_URL=http://localhost:8000

Install and run the front-end

Go to the front-end directory:

cd web

Install the dependencies:

npm install

Start the front end:

npm run dev

Open your browser and visit http://localhost:3000You can see the front-end interface.

Install and run the backend

Using Docker (recommended)

Go to the back-end catalog:

cd backend

Build the mirror image:

docker build -t smartread-backend .

Run the container:

docker run -p 8000:8000 --env-file .env smartread-backend

Local development (no Docker)

Go to the back-end catalog:

cd backend

Create a virtual environment and activate it:

python -m venv .venv
source .venv/bin/activate  # Windows 用 .venv\Scripts\activate

Install the dependencies:

pip install -r requirements.txt

Start the back end:

uvicorn main:app --reload --host 0.0.0.0 --port 8000

How to use the main features

Once installed, SmartRead's core functionality revolves around PDF processing and resource recommendations.

Function 1: Automatic Markup Technology PDF

procedure

Prepare a technical PDF file (e.g., a paper or manual).
Uploading files to the front-end interface (http://localhost:3000), or into the backend/input Folder.
Click "upload and label" on the front end, or run it on the back end:

python main.py --file input/你的文件名.pdf

After processing is complete, the annotated PDF appears in the backend/output Folder.

Functional Description
SmartRead Usage Mistral AI cap (a poem) Groq The model analyzes the document, identifies key content and adds highlighting or annotations. Annotation results are displayed on the PDF for quick and easy reading.

Function 2: Access to relevant resources

procedure

After uploading the PDF in the front-end interface, check "Get related resources".
or run on the back end:

python main.py --file input/你的文件名.pdf --resources

When processing is complete, the interface or terminal displays links to articles and videos.

Functional Description
The system searches the web based on the PDF content and recommends relevant technical articles or videos, with links stored in MongoDB and managed by Cloudinary.

Function 3: Download annotated PDF

procedure

Click "Download" in the front-end interface, or go to the backend/output Folder.
locate 你的文件名_annotated.pdf, save it directly.

Functional Description
The annotated PDF retains the original text with new highlighting and annotations for easy sharing or archiving.

Featured Function Operation

open source contribution

procedure

Modify the code and commit it to GitHub:

git add .
git commit -m "你的修改说明"
git push origin main

Create a Pull Request on GitHub.

Functional Description
SmartRead uses the MIT license to encourage users to participate in development to improve AI models or interfaces.

caveat

Ensure that keys for MongoDB, Mistral AI, Groq, and Cloudinary are configured correctly or functionality will be limited.
The current version is more suitable for English technical documents, Chinese support may need to be optimized.
Docker is more stable and is recommended to be used first.

With these steps, you can use SmartRead Processing Technology PDF in its entirety. easy to use, with intuitive results, it's perfect for those who need in-depth reading.

application scenario

academic research
As students work on their papers, SmartRead saves search time by marking key points and suggesting relevant resources.
technology development
Programmers use it to highlight key parameters while reading API documentation, and to access tutorial videos.
Teamwork
The project team organizes technical manuals, uniformly labels them and then shares them to improve communication efficiency.

QA

Does SmartRead support Chinese PDF?
Currently more suitable for English technical documents, Chinese support is being optimized.
Do I need to network?
Yes, a network connection is required to access relevant resources and run AI models.
Can it be used offline?
The labeling feature works offline, but the resource recommendations need to be online.