Software & Apps

hao-ai-lab/FastVideo: FastVideo is an open-source framework for facilitating large-scale video diffusion models.

AljwadhDecember 17, 2024

0 1,590 2 minutes read

FastVideo is a lightweight framework for facilitating large-scale video distribution models.

FastMochi-Demo.mp4

🤗 FastMochi | 🤗 FastHunyuan | 🔍 Discord

FastVideo currently offers: (more to come)

FastHunyuan and FastMochi: consistency distilled video diffusion models for 8x inference speedup.
First open distillation recipes for video DiT, based on PCM.
Support distilling/finetuning/inferencing state-of-the-art open video DiTs: 1. Mochi 2. Hunyuan.
Scalable training with FSDP, sequence parallelism, and selective activation checkpointing, with near-linear scaling on 64 GPUs.
Memory efficiency finetuning using LoRA, precomputed latent, and precomputed text embeddings.

Dev is evolving and more experimental.

Fast-Hunyuan compares to the original Hunyuan, which achieves 8X diffusion speed boost in the FastVideo framework.

FastHunyuan-Demo.mp4

The comparison between OpenAI Sora, original Hunyuan and FastHunyuan

sora-verse-fasthunyuan.mp4.mp4

2024/12/17: FastVideo v0.1 released.

The code has been tested on Python 3.10.0, CUDA 12.1 and H100.

We recommend using a GPU with 80GB of memory. To run the inference, use the following command:

# Download the model weight
python scripts/huggingface/download_hf.py --repo_id=FastVideo/FastHunyuan --local_dir=data/FastHunyuan --repo_type=model
# CLI inference
sh scripts/inference/inference_hunyuan.sh

You can also find out FastHunyuan at official Hunyuan github.

# Download the model weight
python scripts/huggingface/download_hf.py --repo_id=FastVideo/FastMochi-diffusers --local_dir=data/FastMochi-diffusers --repo_type=model
# CLI inference
bash scripts/inference/inference_mochi_sp.sh

Our distillation recipe is based on Phased Consistency Model. We found no significant improvement using multi-phase distillation, so we kept the single-phase setup identical to the original latent consistency model recipe. We use the MixKit dataset for distillation. To avoid running the text encoder and VAE during training, we preprocessed all data to generate text embeddings and VAE latents. Preprocessing instructions can be found data_preprocess.md. For convenience, we also provide preprocessed data that can be downloaded directly using the following command:

python scripts/huggingface/download_hf.py --repo_id=FastVideo/HD-Mixkit-Finetune-Hunyuan --local_dir=data/HD-Mixkit-Finetune-Hunyuan --repo_type=dataset

Next, download the original model weights with:

python scripts/huggingface/download_hf.py --repo_id=FastVideo/hunyuan --local_dir=data/hunyuan --repo_type=model

To launch the distillation process, use the following commands:

bash scripts/distill/distill_mochi.sh # for mochi
bash scripts/distill/distill_hunyuan.sh # for hunyuan

We also provide an optional script for distillation with loss of counter, located at fastvideo/distill_adv.py. Although we tested the loss of the opponent, we did not notice significant improvements.

Ensure that your data is prepared and processed in the form specified by data_preprocess.md. For convenience, we also provide mochi preprocessed Black Myth Wukong data for direct download:

python scripts/huggingface/download_hf.py --repo_id=FastVideo/Mochi-Black-Myth --local_dir=data/Mochi-Black-Myth --repo_type=dataset

Download the original model weights with:

python scripts/huggingface/download_hf.py --repo_id=genmo/mochi-1-preview --local_dir=data/mochi --repo_type=model
python scripts/huggingface/download_hf.py --repo_id=FastVideo/hunyuan --local_dir=data/hunyuan --repo_type=model

Then you can run finetune with:

bash scripts/finetune/finetune_mochi.sh # for mochi

Note that for finetuning, we did not tune the hyperparameters in the given script

Currently, we only provide Lora Finetune for Mochi model, the command for Lora Finetune is

bash scripts/finetune/finetune_mochi_lora.sh

Minimum Hardware Requirements

40 GB of GPU memory each for 2 GPUs with lora
30 GB GPU memory each for 2 GPUs with CPU offload and lora.

Finetune with Image and Video

Our codebase supports image and video enhancement.

bash scripts/finetune/finetune_hunyuan.sh
bash scripts/finetune/finetune_mochi_lora_mix.sh

For Image-Video Mixture Fine-tuning, make sure to enable the –group_frame option in your script.

We learned and reused code from the following projects: PCM, diffusers, OpenSoraPlanand xDiT.

We thank MBZUAI and Anyscale for their support throughout the project.

https://opengraph.githubassets.com/e16894cc488f40d0f411800ce71c3ca2b77cbe6d1dcc22de754fe835fd658f4f/hao-ai-lab/FastVideo

2024-12-17 20:56:01

AljwadhDecember 17, 2024

0 1,590 2 minutes read

hao-ai-lab/FastVideo: FastVideo is an open-source framework for facilitating large-scale video diffusion models.

Minimum Hardware Requirements

Finetune with Image and Video

Aljwadh

Leave a Reply Cancel reply

Elon Musk agrees with Tweet saying Americans aren’t smart enough for tech jobs

Apple Allows Support for Satellite T-Mobile and Starlink in the iPhone

Lamar Kendrick will appear in Synth Riders experience on Apple Pro vision

The 2024 Movie Monster State of the Union

Thousands of people are evacuating in LA as wildfires and extreme winds hit Southern California

Germany matches agree the plan for expenses expenses, debt debt | The News of Debt

Ryan Reynolds and Andrew Garfield Are Game to Return as Deadpool and Spider-Man

Your Dishwasher Is Gross. Here’s How to Clean It

Apple Music expands its live radio offerings with three new stations

Ready Player Me’s Player Zero sees momentum for Web3 collectible avatars

The 33 Best Shows on Apple TV+ Right Now (December 2024)

Minimum Hardware Requirements

Finetune with Image and Video

Aljwadh

The best laptops of 2024 - CNET

Marcus Rashford drops a bombshell admitting he is ready to leave Man United

Related Articles

Volume-Lookup / Olnerability-Shoverup: Vagual-lookup-facing quickly corcelability to prompts of weaknesses, and streamrin to handle coordinated degradation of the cordrability coordinated vulnerability (CVD).

Apologies, fake, and phantom time

Ex-SAP CTO walks away at € 7.1m payout after scandal • The Register

How Copy Copy-Mud Works | rail

Leave a Reply Cancel reply

Germany matches agree the plan for expenses expenses, debt debt | The News of Debt

Ryan Reynolds and Andrew Garfield Are Game to Return as Deadpool and Spider-Man

Your Dishwasher Is Gross. Here’s How to Clean It

Apple Music expands its live radio offerings with three new stations

Ready Player Me’s Player Zero sees momentum for Web3 collectible avatars

The 33 Best Shows on Apple TV+ Right Now (December 2024)