VBDeepSeek: an open source tool for generating grade 4 word study materials using DeepSeek

Latest AI Resources6mos agorelease AI Sharing Circle

1.7K 00

General Introduction

"Vocabulary Book by DeepSeek" is an open source project developed based on DeepSeek's big model, aiming to help English learners efficiently master the vocabulary of College English Level 4 (CET-4). The project is hosted on GitHub, created by developer vxiaozhi, through Python script combined with DeepSeek's powerful language generation capabilities, automatically generate vocabulary learning materials containing word meanings, roots, example sentences and memorization techniques. The tool organizes words in alphabetical order, has a clear output format, supports JSON file storage, and is suitable for students, teachers, or self-learners. The project code is open and 80% or more is automatically generated by DeepSeek, reflecting the innovative application of AI in education. Whether you are preparing for Grade 4 or improving your vocabulary, this tool provides convenient learning support.

Function List

Automatic generation of Grade 4 vocabulary study materials: Calls the DeepSeek interface to generate word meanings, root analyses, example sentences, and memorization tips.
Alphabetical storage: CET-4 words are categorized into JSON files from A to Z according to their initial letters, making them easy to find and manage.
Helpful Image Generation: Generate word-related mnemonic images through scripts to enhance memorization.
Article Generator: Generate vocabulary learning articles in Markdown format starting with a letter, suitable for blogging or note organization.
Open Source Support: Full Python code is provided and users are free to modify or extend the functionality.

Using Help

Installation process

"Vocabulary Book by DeepSeek" is a Python based tool that requires a certain programming environment to run. Below are the detailed installation and usage steps:

1. Environmental preparation

Installing Python: Ensure that Python 3.8 or above is installed on your system, which can be downloaded and installed from the Python website.
cloning project: Open a terminal or command line and enter the following command to download the project locally:
```
git clone https://github.com/vxiaozhi/vocabulary-book-by-deepseek.git
cd vocabulary-book-by-deepseek
```

Installation of dependencies: The project relies on several Python libraries, run the following command to install them:
```
pip install -r requirements.txt
```
if notrequirements.txt, the core library can be installed manually:
```
pip install requests openai pillow
```
Configuring the DeepSeek API: DeepSeek API key is required. After signing up for a DeepSeek account, get the key in the DeepSeek platform and fill it into the API call section in the project configuration file or code.

2. Use of main functions

The project consists of two core scripts: Word Booster Tool and Booster Image Generator Tool. The following is the detailed operation flow:

(1) Generate word study materials

Prepare word data: The project provides by defaultdata/cet4/JSON files categorized by letters A-Z in the directory (e.g.A.json,B.json). Each file contains a list of words beginning with the corresponding letter.
Running Scripts::
- Open a terminal and go to the project directory.
- Execute the following command to generate a word analysis:
```
python cet4_word_helper.py
```
- The script will read thedata/cet4/The words in the program are used to generate word meanings, roots, example sentences and memorization techniques through the DeepSeek API, and the results are saved to theresult/cet4/JSON file in the directory (e.g.A.json).

View Results: Example of the structure of the generated JSON file:

{
"word": "abandon",
"meaning": "放弃",
"root": "a-(加强) + bandon(控制)",
"example": "He had to abandon his car in the snow.",
"memory_tip": "想象一个人在雪地里放弃aband控制on车。"
}

(2) Generation of mnemonic pictures

Run the image generation script::
- Execute the following command:
```
python gen_words_img.py
```
- The scripts will be based on theresult/cet4/The word data in the script is used to generate mnemonic images, which are saved to the specified directory by default (you need to configure the output path in the script).
Customized settings: Modificationgen_words_img.pyparameters, such as image size, resolution, or style, to ensure that the image is generated to meet the requirements.

(3) Generation of vocabulary articles

Run the article generation script::
- Implementation:
```
python gen_articles.py
```
- Script readingresult/cet4/in the JSON file, generating 26 Markdown files (such as the2025-02-11-cet4-A.md), save toresult/cet4_articles/Catalog.

output format: Each file contains an analysis of words beginning with letters, suitable for learning or sharing. Example:

---
title: "四级词汇-A开头单词"
date: 2025-02-11
---
## abandon
词义：放弃  
词根：a-(加强) + bandon(控制)  
例句：He had to abandon his car in the snow.  
记忆技巧：想象一个人在雪地里放弃aband控制on车。

3. Operational considerations

API Key Security: Do not hard-code DeepSeek API keys directly into scripts, it is recommended to use environment variables to store them:
```
export DEEPSEEK_API_KEY='你的密钥'
```
network connection: Ensure that the network is free when running the script, as it relies on the DeepSeek API.
File path checkingIfdata/cet4/mayberesult/cet4/The directory is missing, you need to manually create or adjust the script path configuration.
Extended functionality: The code can be modified as required, for example to add level 6 word support or to adjust the output format.

4. Operation of special functions

batch file: Generate word profiles for all letters at once by running the main script only once.
Picture-assisted memory: Generated aids images can be imported into electronic notes or printed to complement text-based learning.
Article Sharing: Markdown files can be used directly for blog posting or imported into tools such as Notion to organize study notes.

Cline Mission Cues

Task1

用 Python 写一个 cet4 单词助记工具，对单词进行词义词根分析、例举例句、并提供一些高效的记忆技巧和窍门。 详细需求如下：
1. 单词已经按照字母归类存储在data/cet4/目录下，分别为： A.json B.json ... Z.json
2. 读取每一个 data/cet4/目录下 每个JSON文件中的所有单词，对每个单词调用OpenAI的接口生成该单词的词义、词根、例句、记忆技巧信息。
3. 生成的单词信息保存到 result/cet4/目录下，分别为： A.json B.json ... Z.json

Task2

用 Python3.8 写一个单词助记图片生成工具gen_words_img.py， 详细需求如下：
1. 读取每一个 result/cet4/目录下 每个JSON文件中的所有单词信息，每个单词信息包括word、analysis、draw_explain、draw_prompt 4个字段。
2. 对每个单词调用replicate的接口(接口具体实现在provider_replicate.py:replicate_run)生成该单词的图片。
3. 生成的图片文件保存到 result/cet4_imgs/目录下，文件名称格式为：{first_letter_of_word}/{word}.jpg。如果对应图片文件已存在，则跳过本图片文件的生成。
4. 假设所有依赖库已经安装。

Task3

用 Python3.8 写一个文章生成工具gen_articles.py， 为26个英文字母各生成一个文件，共26个文件，文件名格式为：2025-02-11-cet4-{letter}.md, 每个文件的内容组成如下：
"""
---
layout:     post
title:      "四级词汇-{letter}开头单词"
subtitle:   "四级词汇-{letter}开头单词"
date:       2025-02-11
author:     "vxiaozhi"
catalog: true
tags:
- english
- cet4
---

{{ for all word begin with letter}}
## word
{word.analysis}
{{end}}
"""
其中 word.analysis 通过读取 result/cet4/{letter}.json 获得，result/cet4/{letter}.json存储了{letter}开头的全部单词的信息，如果result/cet4/{letter}.json 不存在，则跳过该letter对应文件的生成。
更多约束如下：
1、2025-02-11-cet4-{letter}.md 保存到 result/cet4_articles 目录下。
2、Python 使用 3.8 版本。
3、假设所有Python依赖库已经安装。