microsoft/Phi-4-multimodal-instruct · Getting Bounding Boxes for Vision

generate_ids = model.generate(
**inputs,
max_new_tokens=1000,
generation_config=generation_config,
output_scores=True,
return_dict_in_generate=True,
)

After generating the output, I tried to fetch the bounding boxes like this,
bounding_boxes = getattr(generate_output, "box_coordinates", None)

I am pretty sure, Ph4-multimodal-instruct doesn't provide bounding_box like Florence-2.
However it would be great, if Ph4-multimodal-instruct would have provided that information because it is doing the Optical character recognition.

Any idea how to get the bounding boxes from the model would be a great help in case of vision capability.
Or Am I missing something.

Any suggestion or idea will be highly appreciable.
Regard.