General Introduction
"Vocabulary Book by DeepSeek" is an open source project developed based on DeepSeek's big model, aiming to help English learners efficiently master the vocabulary of College English Level 4 (CET-4). The project is hosted on GitHub, created by developer vxiaozhi, through Python script combined with DeepSeek's powerful language generation capabilities, automatically generate vocabulary learning materials containing word meanings, roots, example sentences and memorization techniques. The tool organizes words in alphabetical order, has a clear output format, supports JSON file storage, and is suitable for students, teachers, or self-learners. The project code is open and 80% or more is automatically generated by DeepSeek, reflecting the innovative application of AI in education. Whether you are preparing for Grade 4 or improving your vocabulary, this tool provides convenient learning support.
Function List
- Automatic generation of Grade 4 vocabulary study materials: Calls the DeepSeek interface to generate word meanings, root analyses, example sentences, and memorization tips.
- Alphabetical storage: CET-4 words are categorized into JSON files from A to Z according to their initial letters, making them easy to find and manage.
- Helpful Image Generation: Generate word-related mnemonic images through scripts to enhance memorization.
- Article Generator: Generate vocabulary learning articles in Markdown format starting with a letter, suitable for blogging or note organization.
- Open Source Support: Full Python code is provided and users are free to modify or extend the functionality.
Using Help
Installation process
"Vocabulary Book by DeepSeek" is a Python based tool that requires a certain programming environment to run. Below are the detailed installation and usage steps:
1. Environmental preparation
- Installing Python: Ensure that Python 3.8 or above is installed on your system, which can be downloaded and installed from the Python website.
- cloning project: Open a terminal or command line and enter the following command to download the project locally:
git clone https://github.com/vxiaozhi/vocabulary-book-by-deepseek.git cd vocabulary-book-by-deepseek
- Installation of dependencies: The project relies on several Python libraries, run the following command to install them:
pip install -r requirements.txt
if not
requirements.txt
, the core library can be installed manually:pip install requests openai pillow
- Configuring the DeepSeek API: DeepSeek API key is required. After signing up for a DeepSeek account, get the key in the DeepSeek platform and fill it into the API call section in the project configuration file or code.
2. Use of main functions
The project consists of two core scripts: Word Booster Tool and Booster Image Generator Tool. The following is the detailed operation flow:
(1) Generate word study materials
- Prepare word data: The project provides by default
data/cet4/
JSON files categorized by letters A-Z in the directory (e.g.A.json
,B.json
). Each file contains a list of words beginning with the corresponding letter. - Running Scripts::
- Open a terminal and go to the project directory.
- Execute the following command to generate a word analysis:
python cet4_word_helper.py
- The script will read the
data/cet4/
The words in the program are used to generate word meanings, roots, example sentences and memorization techniques through the DeepSeek API, and the results are saved to theresult/cet4/
JSON file in the directory (e.g.A.json
).
- View Results: Example of the structure of the generated JSON file:
{ "word": "abandon", "meaning": "abandonment", "root": "a-(strengthen) + bandon(control)", "example": "He had to abandon his car in the snow.", "example". "example": "He had to abandon his car in the snow.", "memory_tip": "Imagine a man abandoning aband controlon car in the snow." }
(2) Generation of mnemonic pictures
- Run the image generation script::
- Execute the following command:
python gen_words_img.py
- The scripts will be based on the
result/cet4/
The word data in the script is used to generate mnemonic images, which are saved to the specified directory by default (you need to configure the output path in the script).
- Execute the following command:
- Customized settings: Modification
gen_words_img.py
parameters, such as image size, resolution, or style, to ensure that the image is generated to meet the requirements.
(3) Generation of vocabulary articles
- Run the article generation script::
- Implementation:
python gen_articles.py
- Script reading
result/cet4/
in the JSON file, generating 26 Markdown files (such as the2025-02-11-cet4-A.md
), save toresult/cet4_articles/
Catalog.
- Implementation:
- output format: Each file contains an analysis of words beginning with letters, suitable for learning or sharing. Example:
--- title: "Grade 4 Vocabulary - Words Starting with A" date: 2025-02-11 --- ## abandon Etymology: abandon Root: a-(strengthen) + bandon(control) 例句:He had to abandon his car in the snow. Memory Tip: Imagine a man abandoning aband controlon car in the snow.
3. Operational considerations
- API Key Security: Do not hard-code DeepSeek API keys directly into scripts, it is recommended to use environment variables to store them:
export DEEPSEEK_API_KEY='Your key'
- network connection: Ensure that the network is free when running the script, as it relies on the DeepSeek API.
- File path checkingIf
data/cet4/
mayberesult/cet4/
The directory is missing, you need to manually create or adjust the script path configuration. - Extended functionality: The code can be modified as required, for example to add level 6 word support or to adjust the output format.
4. Operation of special functions
- batch file: Generate word profiles for all letters at once by running the main script only once.
- Picture-assisted memory: Generated aids images can be imported into electronic notes or printed to complement text-based learning.
- Article Sharing: Markdown files can be used directly for blog posting or imported into tools such as Notion to organize study notes.
Cline Mission Cues
Task1
Write a cet4 word helper in Python, which analyzes word meanings and roots, gives example sentences, and provides some efficient memorization tips and tricks. The detailed requirements are as follows:
1. The words have been categorized alphabetically and stored in the data/cet4/ directory as follows: A.json B.json ... Z.json
2. read all the words in each JSON file in the data/cet4/ directory, and call OpenAI's interface for each word to generate the word's meaning, root, example sentences, and memorization techniques. 3. the generated word information is stored in the data/cet4/ directory.
3. The generated word information is saved in the result/cet4/ directory, as follows: A.json B.json ... Z.json
Task2
Write a word mnemonic image generator gen_words_img.py in Python 3.8, the detailed requirements are as follows:
1. read all the word information in each JSON file in each result/cet4/ directory, each word information includes 4 fields: word, analysis, draw_explain, draw_prompt. 2. call the replicate interface for each word.
2. for each word calllicate interface (interface specific implementation in provider_replicate.py:replicate_run) to generate the word image.
3. The generated image file is saved to the result/cet4_imgs/ directory with the file name format: {first_letter_of_word}/{word}.jpg. If the corresponding image file already exists, the generation of this image file is skipped.
4. Assume that all dependent libraries have been installed.
Task3
Use Python 3.8 to write an article generator, gen_articles.py, that generates one file for each of the 26 letters of the alphabet, for a total of 26 files, with file names in the format 2025-02-11-cet4-{letter}.md, and the content of each file is composed as follows:
"""
---
layout: post
title: "Level 4 Vocabulary - Words starting with {letter}"
subtitle: "Grade 4 Vocabulary - Words starting with {letter}"
date: 2025-02-11
author: "vxiaozhi"
catalog: true
tags:: English
- cet4
- cet4
---
{{ for all word begin with letter}}
## word
{{word.analysis}}
{{end}}
"""
where word.analysis is obtained by reading result/cet4/{letter}.json, result/cet4/{letter}.json stores information about all words starting with {letter}, and if result/cet4/{letter}.json doesn't exist, it skips the generation of the file corresponding to that letter corresponding file is generated.
More constraints are as follows:
1, 2025-02-11-cet4-{letter}.md is saved to the result/cet4_articles directory.
2、Python uses version 3.8.
3、Assume all Python dependency libraries have been installed.