timm
/

Very flat output? "Probabilities" all close to zero.

#3
by Moghrua - opened

Using the sample code, the results look a bit strange - the "probabilities" come out almost perfectly zero. The scoring function looks like a good match for the original - could there be an issue with the tokenizer somehow?

PyTorch Image Models org

@Moghrua this behaves very differently than softmax where the output is forced to sum to 1. In many cases you can end up with a lot of low scores if none of the texts are a great matches. I've definitely been able to get scores of .5 all the way to .97. Sometimes .1-.2 is a pretty good match.

If you cut and paste the provided beignet example it will output:
Label probabilities: [('a Dog.', 0.0), ('a cat', 0.0), ('a donut', 0.0), ('A Beignet.', 0.517)]

PyTorch Image Models org

If you suspect any tokenizer issues, can double check by comparing w/ https://colab.research.google.com/github/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/SigLIP_demo.ipynb ... I have done some testing and seemed to compare well but could be texts that don't tokenize the same...

I also have a similar concern. I used this image
08710255134840A001.jpeg
and tags ["a dog", "a cat", "a bird", "a fish"]
and the probabilities were Label probabilities: [('a dog', 1e-06), ('a cat', 4.4e-05), ('a bird', 0.0), ('a fish', 5e-06)]
Is this model really expected to have some low probabilities?

PyTorch Image Models org

@talrejanikhil I've observed it can be exceedingly fussy / specific? as to what's going to yield a high prob ... eg, twiddle yours a bit

[('a dog', 0.0), ('a cat on a catfood box', 0.024), ('a catfood box', 0.351), ('a beignet', 0.0)]

So yeah, I think this is usual behaviour, it also seems a bit sensitive to preprocessing / weight translation, esp when unsure, the prob swings on the output from the reference jax version can be a bit higher than I'd expect. So you could try similar prompts in their notebook...

Example if you do use softmax, obv softmax will push up the probs so the sum is 1.0
Label probabilities: [('a dog', 0.02), ('a cat', 0.98), ('a beignet', 0.0)]

Yes that's true. I actually do miss the high probs that CLIP model outputs

PyTorch Image Models org

@Moghrua @talrejanikhil I have observed the same behavior so I had to normalize the outputs here: https://huggingface.co/spaces/merve/multilingual-zero-shot-image-clf I guess since the zero shot accuracy is still better than other models (as claimed by paper) it's just you need to stretch the outputs to actually see that?

@merve do you have code to show how you normalized the outputs?

PyTorch Image Models org
PyTorch Image Models org
edited Jan 25, 2024

@merve @talrejanikhil FYI, down to some numerical differences sigmoid + normalizing like this is essentially softmax

It looks/feels nicer in that everything adding up to 1. must be a probability, but it's pretty obvious there's little to no calibration there. In either case, the sigmoid output is probably more closely calibrated wrt to what was seen in the training distribution...

Hi guys, I haven't read all of this, but the model being generally "more conservative" is totally expected. As Ross says, the model is not calibrated, because it's a "raw" model. What calibration makes most sense depends on your data/task. I guess we should explain this more somewhere at some point.

The good news is that calibrating it is very easy. If you have a dataset representative of your task, you can simply adjust the bias value (a single scalar!) by hand or grid-search, so that the probabilities look like you prefer them. I've done this many times on many tasks, and it works flawlessly. Actually, our official SigLIP colab even contains an interactive demo that shows this:
image.png

My question would still be how could we do this in hugging face? Is there a way to set the bias parameter

Actually I figured this out myself. You can do something like this:

model_name = 'google/siglip-so400m-patch14-384'
model = AutoModel.from_pretrained(model_name)
# Set your bias value here:
model.logit_bias = nn.Parameter(torch.tensor([-10.0]))
processor = AutoProcessor.from_pretrained(model_name)

This significantly increased the probability values for the example I posted above

PyTorch Image Models org

FWIW the same applies to the OpenCLIP variant of the model, once created model.logit_bias = nn.Parameter(torch.tensor([-10.0])) will be equivalent

Sign up or log in to comment