General Introduction
MiniMind is an open source project created by developer jingyaogong. Its core goal is to enable ordinary people to quickly train their own AI models. MiniMind's main feature is that it takes 2 hours to train a 26M-parameter GPT model from scratch on a single NVIDIA 3090 graphics card, at a cost of only about 3 yuan. The project provides full-flow code from pre-training to fine-tuning, including dataset cleaning, pre-training, command fine-tuning, LoRA, DPO, and model distillation, and also supports the visual multimodal extension MiniMind-V. All code has been refactored from the ground up based on PyTorch, with no reliance on third-party abstraction interfaces. As of February 2025, MiniMind has been released in multiple versions, with a minimum model size of 25.8M parameters, and an overwhelmingly positive community response.
Function List
- Supports training 26M parameter GPT models from scratch in less than 2 hours, running on a single 3090 graphics card.
- Provides full-flow code for pre-training, instruction fine-tuning, LoRA, DPO, and model distillation.
- Includes the visual multimodal extension MiniMind-V for image and text processing.
- Supports single-card and multi-card training, compatible with DeepSpeed and wandb visualization.
- Provide OpenAI API protocol server for easy access to third-party chat interfaces.
- Open source high quality datasets and model weights for direct download or secondary development.
- Supports tokenizer training and custom word lists to flexibly adjust the model structure.
Using Help
The use of MiniMind is divided into three steps: installation, training and reasoning. Below is a detailed guide to help users get started quickly.
Installation process
- environmental preparation
- Requires Python 3.10 or later.
- To check for CUDA support on your graphics card, run the following code:
import torch print(torch.cuda.is_available())
If the return
True
If you do not, you will need to install the corresponding PyTorch version. - Install Git for cloning code.
- cloning project
Enter it in the terminal:
git clone https://github.com/jingyaogong/minimind.git
cd minimind
- Installation of dependencies
Accelerated installation using Tsinghua mirrors:
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
If you encounter problems, you can install it manually torch
maybe flash_attn
The
- Download Dataset
- Download the dataset at GitHub README or https://www.modelscope.cn/datasets/gongjy/minimind_dataset/files.
- establish
./dataset
folder, extract the files to this directory. - Recommended downloads
pretrain_hq.jsonl
(1.6GB) andsft_mini_512.jsonl
(1.2GB).
training model
- pre-training
- Run the script to start pre-training:
python train_pretrain.py
- Default use
pretrain_hq.jsonl
The output weights are saved aspretrain_*.pth
The - Multi-card acceleration:
torchrun --nproc_per_node 2 train_pretrain.py
- command fine tuning
- Run the fine-tuning script:
python train_full_sft.py
- Default use
sft_mini_512.jsonl
The output weights are saved asfull_sft_*.pth
The - Ditto for multi-card support.
- LoRA fine-tuning
- Prepare domain data (e.g.
lora_medical.jsonl
), run:python train_lora.py
- The output weights are saved as
lora_xxx_*.pth
The
- DPO Enhanced Learning
- utilization
dpo.jsonl
Data, run:python train_dpo.py
- The output weights are saved as
rlhf_*.pth
The
- Visualization training
- Add Parameters
--use_wandb
, e.g.:python train_pretrain.py --use_wandb
- Check out the training curves on the official wandb website.
Reasoning with Models
- command-line reasoning
- Download model weights (e.g. MiniMind2):
git clone https://huggingface.co/jingyaogong/MiniMind2
- Running Reasoning:
python eval_model.py --load 1 --model_mode 2
- Parameter Description:
--load 1
In transformers format, the--model_mode 2
With MiniMind2.
- webchat
- Install Streamlit:
pip install streamlit
- Startup Interface:
cd scripts streamlit run web_demo.py
- Accessed in a browser
localhost:8501
You can have a conversation.
- API Services
- Start the server:
python serve_openai_api.py
- Test the interface:
curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "MiniMind2", "messages": [{"role": "user", "content": "你好"}], "max_tokens": 512}'
Featured Function Operation
- Visual multimodality (MiniMind-V)
- Download the MiniMind-V model:
git clone https://huggingface.co/jingyaogong/MiniMind2-V
- Download the CLIP visualization model to
./model/vision_model
::
git clone https://huggingface.co/openai/clip-vit-base-patch16
- Running:
python eval_vlm.py --load 1
- Enter text and images and the model generates a description.
- Customized training
- Organize the data as
.jsonl
format into the./dataset
The - modifications
./model/LMConfig.py
parameters (e.g.d_model
mayben_layers
). - Train by following the steps above.
caveat
- When there is not enough video memory, adjust the
batch_size
or increaseaccumulation_steps
The - When the data set is large, process it in batches to avoid memory overflow.
- Ultra-long contexts support adjusting RoPE parameters up to 2048.
application scenario
- AI Learning
MiniMind provides a full set of code and data for beginners to study the large model training process. - Domain customization
Train models with private data, such as medical Q&A or customer service conversations. - Low-cost deployment
The 26M parametric model is suitable for embedded devices such as smart homes. - Teaching Demonstration
Teachers can use it to demonstrate AI training processes and students can practice.
QA
- How much hardware does MiniMind need?
A single NVIDIA 3090 graphics card is sufficient for training, and the CPU can run it but slowly. - Is 2 hours of training reliable?
Yes, based on the 3090 single card test, the 26M parametric model only takes about 2 hours to train from scratch. - Is it commercially available?
Yes, the project is licensed under the Apache 2.0 license, which allows free use and modification. - How to extend the context length?
Adjust RoPE parameters or fine-tune with longer data to support up to 2048.