Software & Apps

GitHub – Jiayi-Pan / TinyZero

PICTURE

Tinyzero is Reproduction to Deepseek R1 Zero. We build on top verrl.

By rl, 3b base LM develops self verification and ability to find all of himself

You can experience Ahah Moment yourself in <$ 30

Twitter Thread: https://x.com/jiayi_pirate/status/1882839370505621655

Full Experiment log: https://wandb.ai/jiayipan/tinyzero

conda create -n zero python=3.9
# install torch (or you can skip this step and let vllm to install the correct version for you)
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# install vllm
pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
pip3 install ray

# verl
pip install -e .

# flash attention 2
pip3 install flash-attn --no-build-isolation
# quality of life
pip install wandb IPython matplotlib

Data Preparation

python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}

A GPU
Worked for the model <= 1.5b. For qwen2.5-0.5b base, we know it fails to learn reason.

export N_GPUS=1
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=1
export EXPERIMENT_NAME=countdown-qwen2.5-0.5b
export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh

3b + model
In this case, the base model can enhance sophisticated basic skills.

export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b
export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh

We experimented with Qwen-2.5-3b instruct too.
Data Preparation
To track the chat template, we must re-process the data:

conda activate zero
python examples/data_preprocess/countdown.py --template_type=qwen-instruct --local_dir={path_to_your_dataset}

TRAINING

export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b-instruct
export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh
  • We have run our experiments based on verrl.
  • We use the model based in Qwen2.5 series Qwen2.5.
@misc{tinyzero,
author       = {Jiayi Pan and Junjie Zhang and Xingyao Wang and Lifan Yuan},
title        = {TinyZero},
howpublished = {(https://github.com/Jiayi-Pan/TinyZero)(https://github.com/Jiayi-Pan/TinyZero)},
note         = {Accessed: 2025-01-24},
year         = {2025}
}

https://opengraph.githubassets.com/d357cdde5daebf75d04c3d9576ef3f560e411f4391d129ef02c5dd91385fc7bf/Jiayi-Pan/TinyZero

2025-01-25 06:38:00

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button