Software & Apps
GitHub – Jiayi-Pan / TinyZero

Tinyzero is Reproduction to Deepseek R1 Zero. We build on top verrl.
By rl, 3b base LM develops self verification and ability to find all of himself
You can experience Ahah Moment yourself in <$ 30
Twitter Thread: https://x.com/jiayi_pirate/status/1882839370505621655
Full Experiment log: https://wandb.ai/jiayipan/tinyzero
conda create -n zero python=3.9
# install torch (or you can skip this step and let vllm to install the correct version for you)
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# install vllm
pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
pip3 install ray
# verl
pip install -e .
# flash attention 2
pip3 install flash-attn --no-build-isolation
# quality of life
pip install wandb IPython matplotlib
Data Preparation
python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}
A GPU
Worked for the model <= 1.5b. For qwen2.5-0.5b base, we know it fails to learn reason.
export N_GPUS=1
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=1
export EXPERIMENT_NAME=countdown-qwen2.5-0.5b
export VLLM_ATTENTION_BACKEND=XFORMERS
bash ./scripts/train_tiny_zero.sh
3b + model
In this case, the base model can enhance sophisticated basic skills.
export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b
export VLLM_ATTENTION_BACKEND=XFORMERS
bash ./scripts/train_tiny_zero.sh
We experimented with Qwen-2.5-3b instruct too.
Data Preparation
To track the chat template, we must re-process the data:
conda activate zero
python examples/data_preprocess/countdown.py --template_type=qwen-instruct --local_dir={path_to_your_dataset}
TRAINING
export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b-instruct
export VLLM_ATTENTION_BACKEND=XFORMERS
bash ./scripts/train_tiny_zero.sh
@misc{tinyzero,
author = {Jiayi Pan and Junjie Zhang and Xingyao Wang and Lifan Yuan},
title = {TinyZero},
howpublished = {(https://github.com/Jiayi-Pan/TinyZero)(https://github.com/Jiayi-Pan/TinyZero)},
note = {Accessed: 2025-01-24},
year = {2025}
}
https://opengraph.githubassets.com/d357cdde5daebf75d04c3d9576ef3f560e411f4391d129ef02c5dd91385fc7bf/Jiayi-Pan/TinyZero
2025-01-25 06:38:00