question about number of parameters in vision encoder

by weidu - opened 13 days ago

13 days ago

The base model of siglip2-base-patch16-224 is said to have 86M parameters, but using this code to count, I have 92M. Why is that?

def count_parameters(model):
    return sum(p.numel() for p in model.parameters())

image_classifier = pipeline(task="zero-shot-image-classification", model="google/siglip2-base-patch16-224")
vision_model = image_classifier.model.vision_model
n_params = count_parameters(vision_model)
print(f"Parameters in vision encoder: {vision_params:,}")

mitsch

7 days ago

TL;DR the difference is due to the MAP/attention pooling head. More details here: https://github.com/google-research/big_vision/issues/159

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment