thanks to Iceclear ❤
Browse files- .gitattributes +0 -1
- README.md +65 -0
- ldmsr4x_finetune_119.ckpt +3 -0
- stablesr_000117.ckpt +3 -0
- stablesr_768v_000139.ckpt +3 -0
- vqgan_cfw_00011.ckpt +3 -0
- webui_512v_models.zip +3 -0
- webui_768v_139.ckpt +3 -0
.gitattributes
CHANGED
@@ -25,7 +25,6 @@
|
|
25 |
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
26 |
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
27 |
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
28 |
-
*.tar filter=lfs diff=lfs merge=lfs -text
|
29 |
*.tflite filter=lfs diff=lfs merge=lfs -text
|
30 |
*.tgz filter=lfs diff=lfs merge=lfs -text
|
31 |
*.wasm filter=lfs diff=lfs merge=lfs -text
|
|
|
25 |
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
26 |
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
27 |
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
|
|
28 |
*.tflite filter=lfs diff=lfs merge=lfs -text
|
29 |
*.tgz filter=lfs diff=lfs merge=lfs -text
|
30 |
*.wasm filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: other
|
3 |
+
pipeline_tag: image-to-image
|
4 |
+
---
|
5 |
+
# StableSR Model Card
|
6 |
+
This model card focuses on the models associated with the StableSR, available [here](https://github.com/IceClear/StableSR).
|
7 |
+
|
8 |
+
## Model Details
|
9 |
+
- **Developed by:** Jianyi Wang
|
10 |
+
- **Model type:** Diffusion-based image super-resolution model
|
11 |
+
- **License:** [S-Lab License 1.0](https://github.com/IceClear/StableSR/blob/main/LICENSE.txt)
|
12 |
+
- **Model Description:** This is the model used in [Paper](https://arxiv.org/abs/2305.07015).
|
13 |
+
- **Resources for more information:** [GitHub Repository](https://github.com/IceClear/StableSR).
|
14 |
+
- **Cite as:**
|
15 |
+
|
16 |
+
@InProceedings{wang2023exploiting,
|
17 |
+
author = {Wang, Jianyi and Yue, Zongsheng and Zhou, Shangchen and Chan, Kelvin CK and Loy, Chen Change},
|
18 |
+
title = {Exploiting Diffusion Prior for Real-World Image Super-Resolution},
|
19 |
+
booktitle = {arXiv preprint arXiv:2305.07015},
|
20 |
+
year = {2023},
|
21 |
+
}
|
22 |
+
|
23 |
+
# Uses
|
24 |
+
Please refer to [S-Lab License 1.0](https://github.com/IceClear/StableSR/blob/main/LICENSE.txt)
|
25 |
+
|
26 |
+
## Limitations and Bias
|
27 |
+
|
28 |
+
### Limitations
|
29 |
+
|
30 |
+
- StableSR still requires multiple steps for generating an image, which is much slower than GAN-based approaches, especially for large images beyond 512 or 768.
|
31 |
+
- StableSR sometimes cannot keep 100% fidelity due to its generative nature.
|
32 |
+
- StableSR sometimes cannot generate perfect details under complex real-world scenarios.
|
33 |
+
|
34 |
+
### Bias
|
35 |
+
While our model is based on a pre-trained Stable Diffusion model, currently we do not observe obvious bias in generated results.
|
36 |
+
We conjecture the main reason is that our model does not rely on text prompts but on low-resolution images.
|
37 |
+
Such strong conditions make our model less likely to be affected.
|
38 |
+
|
39 |
+
|
40 |
+
## Training
|
41 |
+
|
42 |
+
**Training Data**
|
43 |
+
The model developer used the following dataset for training the model:
|
44 |
+
|
45 |
+
- Our diffusion model is finetuned on DF2K (DIV2K and Flickr2K) + OST datasets, available [here](https://github.com/xinntao/Real-ESRGAN/blob/master/docs/Training.md).
|
46 |
+
- We further generate 100k synthetic LR-HR pairs on DF2K_OST using the finetuned diffusion model for training the CFW module.
|
47 |
+
|
48 |
+
**Training Procedure**
|
49 |
+
StableSR is an image super-resolution model finetuned on [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), further equipped with a time-aware encoder and a controllable feature wrapping (CFW) module.
|
50 |
+
|
51 |
+
- Following Stable Diffusion, images are encoded through the fixed autoencoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4.
|
52 |
+
- The latent representations are fed to the time-aware encoder as guidance.
|
53 |
+
- The loss is the same as Stable Diffusion.
|
54 |
+
- After finetuning the diffusion model, we further train the CFW module using the data generated by the finetuned diffusion model.
|
55 |
+
- The autoencoder model is fixed and only CFW is trainable.
|
56 |
+
- The loss is similar to training an autoencoder, except that we use a fixed adversarial loss weight of 0.025 rather than a self-adjustable one.
|
57 |
+
|
58 |
+
We currently provide the following checkpoints:
|
59 |
+
|
60 |
+
- [stablesr_000117.ckpt](https://huggingface.co/Iceclear/StableSR/resolve/main/stablesr_000117.ckpt): Diffusion model finetuned on [SD2.1-512base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base) with DF2K_OST dataset for 117 epochs.
|
61 |
+
- [vqgan_cfw_00011.ckpt](https://huggingface.co/Iceclear/StableSR/resolve/main/vqgan_cfw_00011.ckpt): CFW module with fixed autoencoder trained on synthetic paired data for 11 epochs.
|
62 |
+
- [stablesr_768v_000139.ckpt](https://huggingface.co/Iceclear/StableSR/blob/main/stablesr_768v_000139.ckpt): Diffusion model finetuned on [SD2.1-768v](https://huggingface.co/stabilityai/stable-diffusion-2-1) with DF2K_OST dataset for 139 epochs.
|
63 |
+
|
64 |
+
## Evaluation Results
|
65 |
+
See [Paper](https://arxiv.org/abs/2305.07015) for details.
|
ldmsr4x_finetune_119.ckpt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a4977665bc20d5976c6cfafbb914aca162578f4012252ef2df4839a718be12da
|
3 |
+
size 2039892291
|
stablesr_000117.ckpt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b8862bf3fd11c5b8fe82fb8a4618a1c74a29e0301a190bd6c2e84d68986ef9cb
|
3 |
+
size 6481647231
|
stablesr_768v_000139.ckpt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d5f83b544035b4bf24ab1d7aa86e0f83328e9ce121efb2c850f833178be3d10b
|
3 |
+
size 6481647231
|
vqgan_cfw_00011.ckpt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:5b20ed9a80e9bdbf9e76ff1642a6a5428ca3427b08b779248a88bfbba2e74e8e
|
3 |
+
size 959719471
|
webui_512v_models.zip
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f9484ce3614c7964e8dd0ab9b053e7e728f7ea458d264c3aebb2175ffbf9c4f0
|
3 |
+
size 1273589575
|
webui_768v_139.ckpt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4c6b969948fe692998b33433c0f554506aaf8a39cbd2b36a0db5a72c5ecaa4df
|
3 |
+
size 422185645
|