How much VRAM ?
How much VRAN is meant to be used? I am seeing close to 10GB!
Any optimization recommendations?
I had a look into this https://huggingface.co/blog/sd3#using-sd3-with-diffusers - but I can't make it work. My GPU is T4 (16GB).
This is how my pipeline looks like:
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
text_encoder = T5EncoderModel.from_pretrained(
model_name,
subfolder="text_encoder_3",
quantization_config=quantization_config,
)
self.pipeline = StableDiffusion3Pipeline.from_pretrained(model_name, cache_dir="./cache",
text_encoder_3=text_encoder,
device_map="balanced",
torch_dtype=torch.float16
)
self.pipeline.enable_model_cpu_offload()
Any suggestions?
Error I get is e.g.:torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 252.00 MiB. GPU
Works fine on my old GTX 1080 with 8 GB VRAM.
pipeline = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", text_encoder_3=None, tokenizer_3=None, torch_dtype=torch.float16).to('cuda')
@KernelDebugger - that works! Genius! I wonder what the text_encoder and tokenizer do if they can be set to 'None'?
@KernelDebugger "Dropping the T5 Text Encoder" didn't work for me. I mix it with "Model Offloading", and it works for 8GB GTX 1070 Ti.
pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", text_encoder_3=None, tokenizer_3=None, torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
@KernelDebugger - that works! Genius! I wonder what the text_encoder and tokenizer do if they can be set to 'None'?
They are optional. There is more lightweight encoder in this model, pipline falls back to using it when text_encoder_3=None