Quantization settings

  • vae.: torch.bfloat16. No quantization.
  • text_encoder.layers.:
    • Int8 with Optimum Quanto
    • Target layers:["q_proj", "k_proj", "v_proj", "o_proj", "mlp.down_proj", "mlp.gate_up_proj"]
  • diffusion_model.:
    • Int8 with Optimum Quanto
    • Target layers: ["to_q", "to_k", "to_v", "to_out.0", "ff.net.0.proj", "ff.net.2"]

VRAM cosumption

  • Text encoder (text_encoder.): about 11 GB
  • Denoiser (diffusion_model.): about 10 GB

Samples

torch.bfloat16 Quanto Int8
VRAM 40GB (without offloading) VRAM 28GB (without offloading)
Generation parameters
  • prompt: """ A photo of a nendoroid figure of hatsune miku holding a sign that says "CogView4" """"
  • negative_prompt: "blurry, low quality, horror"
  • height: 1152
  • width: 1152
  • cfg_scale: 3.5
  • num_inference_steps: 20
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for p1atdev/CogView4-6B-quanto_int8

Base model

THUDM/glm-4-9b
Finetuned
THUDM/CogView4-6B
Quantized
(1)
this model