dixuan.lin commited on
Commit
4e5867a
·
1 Parent(s): 84b211e
LICENSE.txt ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - zh
5
+ license: other
6
+ tasks:
7
+ - text-generation
8
+
9
+ ---
10
+
11
+ <!-- markdownlint-disable first-line-h1 -->
12
+ <!-- markdownlint-disable html -->
13
+
14
+ # <span id="Terms">声明与协议/Terms and Conditions</span>
15
+
16
+ ## 声明
17
+
18
+ 我们在此声明,不要利用Skywork模型进行任何危害国家社会安全或违法的活动。另外,我们也要求使用者不要将 Skywork 模型用于未经适当安全审查和备案的互联网服务。我们希望所有的使用者都能遵守这个原则,确保科技的发展能在规范和合法的环境下进行。
19
+
20
+ 我们已经尽我们所能,来确保模型训练过程中使用的数据的合规性。然而,尽管我们已经做出了巨大的努力,但由于模型和数据的复杂性,仍有可能存在一些无法预见的问题。因此,如果由于使用skywork开源模型而导致的任何问题,包括但不限于数据安全问题、公共舆论风险,或模型被误导、滥用、传播或不当利用所带来的任何风险和问题,我们将不承担任何责任。
21
+
22
+ We hereby declare that the Skywork model should not be used for any activities that pose a threat to national or societal security or engage in unlawful actions. Additionally, we request users not to deploy the Skywork model for internet services without appropriate security reviews and records. We hope that all users will adhere to this principle to ensure that technological advancements occur in a regulated and lawful environment.
23
+
24
+ We have done our utmost to ensure the compliance of the data used during the model's training process. However, despite our extensive efforts, due to the complexity of the model and data, there may still be unpredictable risks and issues. Therefore, if any problems arise as a result of using the Skywork open-source model, including but not limited to data security issues, public opinion risks, or any risks and problems arising from the model being misled, abused, disseminated, or improperly utilized, we will not assume any responsibility.
25
+
26
+ ## 协议
27
+
28
+ 社区使用Skywork模型需要遵循[《Skywork 模型社区许可协议》](https://github.com/SkyworkAI/Skywork/blob/main/Skywork%20模型社区许可协议.pdf)。Skywork模型支持商业用途,如果您计划将Skywork模型或其衍生品用于商业目的,无需再次申请, 但请您仔细阅读[《Skywork 模型社区许可协议》](https://github.com/SkyworkAI/Skywork/blob/main/Skywork%20模型社区许可协议.pdf)并严格遵守相关条款。
29
+
30
+
31
+ The community usage of Skywork model requires [Skywork Community License](https://github.com/SkyworkAI/Skywork/blob/main/Skywork%20Community%20License.pdf). The Skywork model supports commercial use. If you plan to use the Skywork model or its derivatives for commercial purposes, you must abide by terms and conditions within [Skywork Community License](https://github.com/SkyworkAI/Skywork/blob/main/Skywork%20Community%20License.pdf).
32
+
33
+
34
+
35
+ [《Skywork 模型社区许可协议》》]:https://github.com/SkyworkAI/Skywork/blob/main/Skywork%20模型社区许可协议.pdf
36
+
37
+
38
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - tencent/HunyuanVideo
7
+ pipeline_tag: text-to-video
8
+ ---
9
+
10
+ # SkyReels V1: Human-Centric Video Foundation Model
11
+ <p align="center">
12
+ <img src="assets/logo2.png" alt="Skyreels Logo" width="60%">
13
+ </p>
14
+
15
+ <p align="center">
16
+ <a href="https://github.com/SkyworkAI/SkyReels-V1" target="_blank">🌐 Github</a> · <a href="https://www.skyreels.ai/" target="_blank">👋 Playground</a>
17
+ </p>
18
+
19
+ ---
20
+ This repo contains Diffusers-format model weights for SkyReels V1 Text-to-Video models. You can find the inference code on our github repository [SkyReels-V1](https://github.com/SkyworkAI/SkyReels-V1).
21
+
22
+ ## Introduction
23
+
24
+ SkyReels V1 is the first and most advanced open-source human-centric video foundation model. By fine-tuning <a href="https://huggingface.co/tencent/HunyuanVideo">HunyuanVideo</a> on O(10M) high-quality film and television clips, Skyreels V1 offers three key advantages:
25
+
26
+ 1. **Open-Source Leadership**: Our Text-to-Video model achieves state-of-the-art (SOTA) performance among open-source models, comparable to proprietary models like Kling and Hailuo.
27
+ 2. **Advanced Facial Animation**: Captures 33 distinct facial expressions with over 400 natural movement combinations, accurately reflecting human emotions.
28
+ 3. **Cinematic Lighting and Aesthetics**: Trained on high-quality Hollywood-level film and television data, each generated frame exhibits cinematic quality in composition, actor positioning, and camera angles.
29
+
30
+ ## 🔑 Key Features
31
+
32
+ ### 1. Self-Developed Data Cleaning and Annotation Pipeline
33
+
34
+ Our model is built on a self-developed data cleaning and annotation pipeline, creating a vast dataset of high-quality film, television, and documentary content.
35
+
36
+ - **Expression Classification**: Categorizes human facial expressions into 33 distinct types.
37
+ - **Character Spatial Awareness**: Utilizes 3D human reconstruction technology to understand spatial relationships between multiple people in a video, enabling film-level character positioning.
38
+ - **Action Recognition**: Constructs over 400 action semantic units to achieve a precise understanding of human actions.
39
+ - **Scene Understanding**: Conducts cross-modal correlation analysis of clothing, scenes, and plots.
40
+
41
+ ### 2. Multi-Stage Image-to-Video Pretraining
42
+
43
+ Our multi-stage pretraining pipeline, inspired by the <a href="https://huggingface.co/tencent/HunyuanVideo">HunyuanVideo</a> design, consists of the following stages:
44
+
45
+ - **Stage 1: Model Domain Transfer Pretraining**: We use a large dataset (O(10M) of film and television content) to adapt the text-to-video model to the human-centric video domain.
46
+ - **Stage 2: Image-to-Video Model Pretraining**: We convert the text-to-video model from Stage 1 into an image-to-video model by adjusting the conv-in parameters. This new model is then pretrained on the same dataset used in Stage 1.
47
+ - **Stage 3: High-Quality Fine-Tuning**: We fine-tune the image-to-video model on a high-quality subset of the original dataset, ensuring superior performance and quality.
48
+
49
+ ## Model Introduction
50
+ | Model Name | Resolution | Video Length | FPS | Download Link |
51
+ |-----------------|------------|--------------|-----|---------------|
52
+ | SkyReels-V1-Hunyuan-I2V | 544px960p | 97 | 24 | 🤗 [Download](https://huggingface.co/Skywork/SkyReels-V1-Hunyuan-I2V) |
53
+ | SkyReels-V1-Hunyuan-T2V (Current) | 544px960p | 97 | 24 | 🤗 [Download](https://huggingface.co/Skywork/SkyReels-V1-Hunyuan-T2V) |
54
+
55
+ ## Usage
56
+ **See the [Guide](https://github.com/SkyworkAI/SkyReels-V1) for details.**
57
+
58
+ ## Citation
59
+ ```BibTeX
60
+ @misc{SkyReelsV1,
61
+ author = {SkyReels-AI},
62
+ title = {Skyreels V1: Human-Centric Video Foundation Model},
63
+ year = {2025},
64
+ publisher = {Huggingface},
65
+ journal = {Huggingface repository},
66
+ howpublished = {\url{https://huggingface.co/Skywork/SkyReels-V1-Hunyuan-T2V}}
67
+ }
68
+ ```
assets/logo.jpg ADDED
assets/logo2.png ADDED
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "HunyuanVideoTransformer3DModel",
3
+ "_diffusers_version": "0.32.1",
4
+ "attention_head_dim": 128,
5
+ "guidance_embeds": true,
6
+ "in_channels": 16,
7
+ "mlp_ratio": 4.0,
8
+ "num_attention_heads": 24,
9
+ "num_layers": 20,
10
+ "num_refiner_layers": 2,
11
+ "num_single_layers": 40,
12
+ "out_channels": 16,
13
+ "patch_size": 2,
14
+ "patch_size_t": 1,
15
+ "pooled_projection_dim": 768,
16
+ "qk_norm": "rms_norm",
17
+ "rope_axes_dim": [
18
+ 16,
19
+ 56,
20
+ 56
21
+ ],
22
+ "rope_theta": 256.0,
23
+ "text_embed_dim": 4096
24
+ }
diffusion_pytorch_model-00001-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65195ae2dacc7f28491bbac8ce3d0d69d01bbe1ed9ee2de4b4879797f8da45d9
3
+ size 4987847512
diffusion_pytorch_model-00002-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95dd704a6df7ae59dd3d74500c41621aeae2eb38022f58f2090d0fed328f0c72
3
+ size 4984233368
diffusion_pytorch_model-00003-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e0cf0433a0b9ba49dce397db77e97f7ded04542b09f80e2f1fe8aba6dfbf87a2
3
+ size 4984183472
diffusion_pytorch_model-00004-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:83fe6be9b8c8d5834bbb7015171eef61c93c5945ac6629f0a033a7cc7f80d0a7
3
+ size 4984051232
diffusion_pytorch_model-00005-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee74d80e05e81f2c8da73a3d377268998f6c403fb0966d8a6026390cdc203792
3
+ size 4927434136
diffusion_pytorch_model-00006-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:db5894ff1f52ae7c4f6a95b22b0f023e88b3dc20c52ab15d3e12f3effaf1e1dc
3
+ size 774425808
diffusion_pytorch_model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff