hao-ai-lab/FastVideo: FastVideo is an open-source framework for facilitating large-scale video diffusion models.

FastVideo is a lightweight framework for facilitating large-scale video distribution models.
FastMochi-Demo.mp4
🤗 FastMochi | 🤗 FastHunyuan | 🔍 Discord
FastVideo currently offers: (more to come)
- FastHunyuan and FastMochi: consistency distilled video diffusion models for 8x inference speedup.
- First open distillation recipes for video DiT, based on PCM.
- Support distilling/finetuning/inferencing state-of-the-art open video DiTs: 1. Mochi 2. Hunyuan.
- Scalable training with FSDP, sequence parallelism, and selective activation checkpointing, with near-linear scaling on 64 GPUs.
- Memory efficiency finetuning using LoRA, precomputed latent, and precomputed text embeddings.
Dev is evolving and more experimental.
Fast-Hunyuan compares to the original Hunyuan, which achieves 8X diffusion speed boost in the FastVideo framework.
FastHunyuan-Demo.mp4
The comparison between OpenAI Sora, original Hunyuan and FastHunyuan
sora-verse-fasthunyuan.mp4.mp4
2024/12/17
:FastVideo
v0.1 released.
The code has been tested on Python 3.10.0, CUDA 12.1 and H100.
We recommend using a GPU with 80GB of memory. To run the inference, use the following command:
# Download the model weight
python scripts/huggingface/download_hf.py --repo_id=FastVideo/FastHunyuan --local_dir=data/FastHunyuan --repo_type=model
# CLI inference
sh scripts/inference/inference_hunyuan.sh
You can also find out FastHunyuan at official Hunyuan github.
# Download the model weight
python scripts/huggingface/download_hf.py --repo_id=FastVideo/FastMochi-diffusers --local_dir=data/FastMochi-diffusers --repo_type=model
# CLI inference
bash scripts/inference/inference_mochi_sp.sh
Our distillation recipe is based on Phased Consistency Model. We found no significant improvement using multi-phase distillation, so we kept the single-phase setup identical to the original latent consistency model recipe. We use the MixKit dataset for distillation. To avoid running the text encoder and VAE during training, we preprocessed all data to generate text embeddings and VAE latents. Preprocessing instructions can be found data_preprocess.md. For convenience, we also provide preprocessed data that can be downloaded directly using the following command:
python scripts/huggingface/download_hf.py --repo_id=FastVideo/HD-Mixkit-Finetune-Hunyuan --local_dir=data/HD-Mixkit-Finetune-Hunyuan --repo_type=dataset
Next, download the original model weights with:
python scripts/huggingface/download_hf.py --repo_id=FastVideo/hunyuan --local_dir=data/hunyuan --repo_type=model
To launch the distillation process, use the following commands:
bash scripts/distill/distill_mochi.sh # for mochi
bash scripts/distill/distill_hunyuan.sh # for hunyuan
We also provide an optional script for distillation with loss of counter, located at fastvideo/distill_adv.py
. Although we tested the loss of the opponent, we did not notice significant improvements.
Ensure that your data is prepared and processed in the form specified by data_preprocess.md. For convenience, we also provide mochi preprocessed Black Myth Wukong data for direct download:
python scripts/huggingface/download_hf.py --repo_id=FastVideo/Mochi-Black-Myth --local_dir=data/Mochi-Black-Myth --repo_type=dataset
Download the original model weights with:
python scripts/huggingface/download_hf.py --repo_id=genmo/mochi-1-preview --local_dir=data/mochi --repo_type=model
python scripts/huggingface/download_hf.py --repo_id=FastVideo/hunyuan --local_dir=data/hunyuan --repo_type=model
Then you can run finetune with:
bash scripts/finetune/finetune_mochi.sh # for mochi
Note that for finetuning, we did not tune the hyperparameters in the given script
Currently, we only provide Lora Finetune for Mochi model, the command for Lora Finetune is
bash scripts/finetune/finetune_mochi_lora.sh
- 40 GB of GPU memory each for 2 GPUs with lora
- 30 GB GPU memory each for 2 GPUs with CPU offload and lora.
Our codebase supports image and video enhancement.
bash scripts/finetune/finetune_hunyuan.sh
bash scripts/finetune/finetune_mochi_lora_mix.sh
For Image-Video Mixture Fine-tuning, make sure to enable the –group_frame option in your script.
We learned and reused code from the following projects: PCM, diffusers, OpenSoraPlanand xDiT.
We thank MBZUAI and Anyscale for their support throughout the project.
https://opengraph.githubassets.com/e16894cc488f40d0f411800ce71c3ca2b77cbe6d1dcc22de754fe835fd658f4f/hao-ai-lab/FastVideo
2024-12-17 20:56:01