|
# [SV3D-diffusers](https://github.com/chenguolin/sv3d-diffusers) |
|
|
|
data:image/s3,"s3://crabby-images/22174/2217472eae0cfb6b81bc9c742005514447d707ec" alt="" |
|
|
|
This repo (https://github.com/chenguolin/sv3d-diffusers) provides scripts about: |
|
|
|
1. Spatio-temporal UNet (`SV3DUNetSpatioTemporalConditionModel`) and pipeline (`StableVideo3DDiffusionPipeline`) modified from [SVD](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py) for [SV3D](https://sv3d.github.io) in the [diffusers](https://github.com/huggingface/diffusers) convention. |
|
|
|
2. Converting the [Stability-AI](https://github.com/Stability-AI/generative-models)'s [SV3D-p UNet checkpoint](https://huggingface.co/stabilityai/sv3d) to the [diffusers](https://github.com/huggingface/diffusers) convention. |
|
|
|
3. Infering the `SV3D-p` model with the [diffusers](https://github.com/huggingface/diffusers) library to synthesize a 21-frame orbital video around a 3D object from a single-view image (preprocessed by removing background and centering first). |
|
|
|
Converted SV3D-p checkpoints have been uploaded to HuggingFace🤗 [chenguolin/sv3d-diffusers](https://huggingface.co/chenguolin/sv3d-diffusers). |
|
|
|
|
|
## 🚀 Usage |
|
```bash |
|
git clone https://github.com/chenguolin/sv3d-diffusers.git |
|
# Please install PyTorch first according to your CUDA version |
|
pip3 install -r requirements.txt |
|
# If you can't access to HuggingFace🤗, try: |
|
# export HF_ENDPOINT=https://hf-mirror.com |
|
python3 infer.py --output_dir out/ --image_path assets/images/sculpture.png --elevation 10 --half_precision --seed -1 |
|
``` |
|
The synthesized video will save at `out/` as a `.gif` file. |
|
|
|
|
|
## 📸 Results |
|
> Image preprocessing and random seed for different implementations are different, so the results are presented only for reference. |
|
|
|
| Implementation | sculpture | bag | kunkun | |
|
| :------------- | :------: | :----: | :----: | |
|
| **SV3D-diffusers (Ours)** | data:image/s3,"s3://crabby-images/97a5e/97a5edf5f4e85f62d805deefab8621474037b311" alt="" | data:image/s3,"s3://crabby-images/b7029/b7029a3a16aa2a6d7932480fcfa95a77916f956b" alt="" | data:image/s3,"s3://crabby-images/ec5d2/ec5d208ee08db888872b8f3b9520d68143b66c9d" alt="" | |
|
| **Official SV3D** | data:image/s3,"s3://crabby-images/a9a94/a9a94173e15418fde1c1ae3dd8147a1d0574975c" alt="" | data:image/s3,"s3://crabby-images/5f41e/5f41e700df4de1ab036db0eba84d226edaa79434" alt="" | data:image/s3,"s3://crabby-images/d67a0/d67a0272f74b9c36ec02f9a549b6b7bdf4e3cea2" alt="" | |
|
|
|
|
|
## 📚 Citation |
|
If you find this repo helpful, please consider giving this repository a star 🌟 and citing the original SV3D paper. |
|
``` |
|
@inproceedings{voleti2024sv3d, |
|
author={Voleti, Vikram and Yao, Chun-Han and Boss, Mark and Letts, Adam and Pankratz, David and Tochilkin, Dmitrii and Laforte, Christian and Rombach, Robin and Jampani, Varun}, |
|
title={{SV3D}: Novel Multi-view Synthesis and {3D} Generation from a Single Image using Latent Video Diffusion}, |
|
booktitle={European Conference on Computer Vision (ECCV)}, |
|
year={2024}, |
|
} |
|
``` |
|
|