Software & Apps

GitHub – Jiayi-Pan / TinyZero

AljwadhJanuary 25, 2025

0 1,591 Less than a minute

Tinyzero is Reproduction to Deepseek R1 Zero. We build on top verrl.

By rl, 3b base LM develops self verification and ability to find all of himself

You can experience Ahah Moment yourself in <$ 30

Twitter Thread: https://x.com/jiayi_pirate/status/1882839370505621655

Full Experiment log: https://wandb.ai/jiayipan/tinyzero

conda create -n zero python=3.9
# install torch (or you can skip this step and let vllm to install the correct version for you)
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# install vllm
pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
pip3 install ray

# verl
pip install -e .

# flash attention 2
pip3 install flash-attn --no-build-isolation
# quality of life
pip install wandb IPython matplotlib

Data Preparation

python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}

A GPU
Worked for the model <= 1.5b. For qwen2.5-0.5b base, we know it fails to learn reason.

export N_GPUS=1
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=1
export EXPERIMENT_NAME=countdown-qwen2.5-0.5b
export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh

3b + model
In this case, the base model can enhance sophisticated basic skills.

export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b
export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh

We experimented with Qwen-2.5-3b instruct too.
Data Preparation
To track the chat template, we must re-process the data:

conda activate zero
python examples/data_preprocess/countdown.py --template_type=qwen-instruct --local_dir={path_to_your_dataset}

TRAINING

export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b-instruct
export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh

We have run our experiments based on verrl.
We use the model based in Qwen2.5 series Qwen2.5.

@misc{tinyzero,
author       = {Jiayi Pan and Junjie Zhang and Xingyao Wang and Lifan Yuan},
title        = {TinyZero},
howpublished = {(https://github.com/Jiayi-Pan/TinyZero)(https://github.com/Jiayi-Pan/TinyZero)},
note         = {Accessed: 2025-01-24},
year         = {2025}
}

https://opengraph.githubassets.com/d357cdde5daebf75d04c3d9576ef3f560e411f4391d129ef02c5dd91385fc7bf/Jiayi-Pan/TinyZero

2025-01-25 06:38:00

AljwadhJanuary 25, 2025

0 1,591 Less than a minute

GitHub – Jiayi-Pan / TinyZero

Aljwadh

Leave a Reply Cancel reply

Elon Musk agrees with Tweet saying Americans aren’t smart enough for tech jobs

Apple Allows Support for Satellite T-Mobile and Starlink in the iPhone

Lamar Kendrick will appear in Synth Riders experience on Apple Pro vision

The 2024 Movie Monster State of the Union

Thousands of people are evacuating in LA as wildfires and extreme winds hit Southern California

California Maine is convicted of a dark crimson using bitcoin to

Ryan Reynolds and Andrew Garfield Are Game to Return as Deadpool and Spider-Man

Your Dishwasher Is Gross. Here’s How to Clean It

Apple Music expands its live radio offerings with three new stations

Ready Player Me’s Player Zero sees momentum for Web3 collectible avatars

The 33 Best Shows on Apple TV+ Right Now (December 2024)

Aljwadh

Esg etf can survive trump? The advisers weigh

A championship: Nabil Anane hands Nico Carrillo Three-knockdown defeat to win title | Boxing news

Related Articles

Asking rents soar as LA fires destroy homes

MPA, Google, Verizon meets to discuss * Torrentfreak

Quickwit joins Datadog | Quickwit

The White House rescinds Federal Aid Freeze