license: apache-2.0
base_model:
- THUDM/CogVideoX-5b-I2V
pipeline_tag: image-to-video
SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers
π Github Β· π Playground Β· Discord
This repo contains Diffusers style model weights for Skyreels A1 models. You can find the inference code on SkyReels-A1 repository.
π₯π₯π₯ News!!
- Mar 4, 2025: π We release audio-driven portrait image animation pipeline. SkyReels-A1
- Feb 18, 2025: π We release the inference code and model weights of SkyReels-A1.
- Feb 18, 2025: π₯ We have open-sourced I2V video generation model SkyReels-V1. This is the first and most advanced open-source human-centric video foundation model.
Overview of SkyReels-A1 framework. Given an input video sequence and a reference portrait image, we extract facial expression-aware landmarks from the video, which serve as motion descriptors for transferring expressions onto the portrait. Utilizing a conditional video generation framework based on DiT, our approach directly integrates these facial expression-aware landmarks into the input latent space. In alignment with prior research, we employ a pose guidance mechanism constructed within a VAE architecture. This component encodes facial expression-aware landmarks as conditional input for the DiT framework, thereby enabling the model to capture essential low- dimensional visual attributes while preserving the semantic integrity of facial features.
Some generated results:
Citation
If you find SkyReels-A1 useful for your research, welcome to cite our work using the following BibTeX:
@article{qiu2025skyreels,
title={SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers},
author={Qiu, Di and Fei, Zhengcong and Wang, Rui and Bai, Jialin and Yu, Changqian and Fan, Mingyuan and Chen, Guibin and Wen, Xiang},
journal={arXiv preprint arXiv:2502.10841},
year={2025}
}