MiniMind: 2 hours from scratch training 26M parameters GPT open source tools

Latest AI Resources4mos agorelease AI Sharing Circle

1.8K 00

General Introduction

MiniMind is an open source project created by developer jingyaogong. Its core goal is to enable ordinary people to quickly train their own AI models. MiniMind's main feature is that it takes 2 hours to train a 26M-parameter GPT model from scratch on a single NVIDIA 3090 graphics card, at a cost of only about 3 yuan. The project provides full-flow code from pre-training to fine-tuning, including dataset cleaning, pre-training, command fine-tuning, LoRA, DPO, and model distillation, and also supports the visual multimodal extension MiniMind-V. All code has been refactored from the ground up based on PyTorch, with no reliance on third-party abstraction interfaces. As of February 2025, MiniMind has been released in multiple versions, with a minimum model size of 25.8M parameters, and an overwhelmingly positive community response.

Function List

Supports training 26M parameter GPT models from scratch in less than 2 hours, running on a single 3090 graphics card.
Provides full-flow code for pre-training, instruction fine-tuning, LoRA, DPO, and model distillation.
Includes the visual multimodal extension MiniMind-V for image and text processing.
Supports single-card and multi-card training, compatible with DeepSpeed and wandb visualization.
Provide OpenAI API protocol server for easy access to third-party chat interfaces.
Open source high quality datasets and model weights for direct download or secondary development.
Supports tokenizer training and custom word lists to flexibly adjust the model structure.

Using Help

The use of MiniMind is divided into three steps: installation, training and reasoning. Below is a detailed guide to help users get started quickly.

Installation process

environmental preparation
- Requires Python 3.10 or later.
- To check for CUDA support on your graphics card, run the following code:
```
import torch
print(torch.cuda.is_available())
```
  If the return TrueIf you do not, you will need to install the corresponding PyTorch version.
- Install Git for cloning code.
cloning project
Enter it in the terminal:

git clone https://github.com/jingyaogong/minimind.git
cd minimind

Installation of dependencies
Accelerated installation using Tsinghua mirrors:

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

If you encounter problems, you can install it manually torch maybe flash_attnThe

Download Dataset

Download the dataset at GitHub README or https://www.modelscope.cn/datasets/gongjy/minimind_dataset/files.
establish ./dataset folder, extract the files to this directory.
Recommended downloads pretrain_hq.jsonl(1.6GB) and sft_mini_512.jsonl(1.2GB).

training model

pre-training

Run the script to start pre-training:
```
python train_pretrain.py
```
Default use pretrain_hq.jsonlThe output weights are saved as pretrain_*.pthThe

Multi-card acceleration:

torchrun --nproc_per_node 2 train_pretrain.py

command fine tuning

Run the fine-tuning script:
```
python train_full_sft.py
```
Default use sft_mini_512.jsonlThe output weights are saved as full_sft_*.pthThe
Ditto for multi-card support.

LoRA fine-tuning

Prepare domain data (e.g. lora_medical.jsonl), run:
```
python train_lora.py
```
The output weights are saved as lora_xxx_*.pthThe

DPO Enhanced Learning

utilization dpo.jsonl Data, run:
```
python train_dpo.py
```
The output weights are saved as rlhf_*.pthThe

Visualization training

Add Parameters --use_wandb, e.g.:
```
python train_pretrain.py --use_wandb
```
Check out the training curves on the official wandb website.

Reasoning with Models

command-line reasoning

Download model weights (e.g. MiniMind2):

git clone https://huggingface.co/jingyaogong/MiniMind2

Running Reasoning:

python eval_model.py --load 1 --model_mode 2

Parameter Description:--load 1 In transformers format, the--model_mode 2 With MiniMind2.

webchat

Install Streamlit:
```
pip install streamlit
```
Startup Interface:
```
cd scripts
streamlit run web_demo.py
```
Accessed in a browser localhost:8501You can have a conversation.

API Services

Start the server:
```
python serve_openai_api.py
```

Test the interface:

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "MiniMind2", "messages": [{"role": "user", "content": "你好"}], "max_tokens": 512}'

Featured Function Operation

Visual multimodality (MiniMind-V)
Download the MiniMind-V model:

git clone https://huggingface.co/jingyaogong/MiniMind2-V

Download the CLIP visualization model to ./model/vision_model::

git clone https://huggingface.co/openai/clip-vit-base-patch16

Running:

python eval_vlm.py --load 1

Enter text and images and the model generates a description.
Customized training
Organize the data as .jsonl format into the ./datasetThe
modifications ./model/LMConfig.py parameters (e.g. d_model maybe n_layers).
Train by following the steps above.

caveat

When there is not enough video memory, adjust the batch_size or increase accumulation_stepsThe
When the data set is large, process it in batches to avoid memory overflow.
Ultra-long contexts support adjusting RoPE parameters up to 2048.

application scenario

AI Learning
MiniMind provides a full set of code and data for beginners to study the large model training process.
Domain customization
Train models with private data, such as medical Q&A or customer service conversations.
Low-cost deployment
The 26M parametric model is suitable for embedded devices such as smart homes.
Teaching Demonstration
Teachers can use it to demonstrate AI training processes and students can practice.

QA

How much hardware does MiniMind need?
A single NVIDIA 3090 graphics card is sufficient for training, and the CPU can run it but slowly.
Is 2 hours of training reliable?
Yes, based on the 3090 single card test, the 26M parametric model only takes about 2 hours to train from scratch.
Is it commercially available?
Yes, the project is licensed under the Apache 2.0 license, which allows free use and modification.
How to extend the context length?
Adjust RoPE parameters or fine-tune with longer data to support up to 2048.