GIT-base fine-tuned for Image Captioning on High-Level descriptions of Actions
GIT base trained on the HL dataset for action generation of images
Model fine-tuning ποΈβ
- Trained for 10 epochs
- lr: 5eβ5
- Adam optimizer
- half-precision (fp16)
Test set metrics π§Ύ
| Cider | SacreBLEU | Rouge-L|
|--------|------------|--------|
| 110.63 | 15.21 | 30.45 |
Model in Action π
import requests
from PIL import Image
from transformers import AutoProcessor, AutoModelForCausalLM
processor = AutoProcessor.from_pretrained("git-base-captioning-ft-hl-actions")
model = AutoModelForCausalLM.from_pretrained("git-base-captioning-ft-hl-actions").to("cuda")
img_url = 'https://datasets-server.huggingface.co/assets/michelecafagna26/hl/--/default/train/0/image/image.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
inputs = processor(raw_image, return_tensors="pt").to("cuda")
pixel_values = inputs.pixel_values
generated_ids = model.generate(pixel_values=pixel_values, max_length=50,
do_sample=True,
top_k=120,
top_p=0.9,
early_stopping=True,
num_return_sequences=1)
processor.batch_decode(generated_ids, skip_special_tokens=True)
>>> "she is holding an umbrella."
BibTex and citation info
@inproceedings{cafagna2023hl,
title={{HL} {D}ataset: {V}isually-grounded {D}escription of {S}cenes, {A}ctions and
{R}ationales},
author={Cafagna, Michele and van Deemter, Kees and Gatt, Albert},
booktitle={Proceedings of the 16th International Natural Language Generation Conference (INLG'23)},
address = {Prague, Czech Republic},
year={2023}
}
- Downloads last month
- 100
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.