Zero-Shot Image Classification
Transformers
Safetensors
siglip
vision
Inference Endpoints

question about number of parameters in vision encoder

#6
by weidu - opened

The base model of siglip2-base-patch16-224 is said to have 86M parameters, but using this code to count, I have 92M. Why is that?

def count_parameters(model):
    return sum(p.numel() for p in model.parameters())

image_classifier = pipeline(task="zero-shot-image-classification", model="google/siglip2-base-patch16-224")
vision_model = image_classifier.model.vision_model
n_params = count_parameters(vision_model)
print(f"Parameters in vision encoder: {vision_params:,}")

TL;DR the difference is due to the MAP/attention pooling head. More details here: https://github.com/google-research/big_vision/issues/159

Sign up or log in to comment