Demo example in the paper

by willsky - opened 8 days ago

8 days ago

Great work!
May I ask how to get the results in Figure 4 and Figure 5 in the paper? I.e., retrieve the specific frames corresponding to the prompts.
Many thanks!

ynhe

OpenGVLab org 5 days ago

To achieve a more detail video understanding than conversations, you need to load third-party modules from TPO, which can be referred to https://huggingface.co/OpenGVLab/VideoChat-TPO/tree/main

willsky

5 days ago

It seems the the third-party modules in TPO are cgdetr and sam2. How should I proceed after loading these two modules?

ynhe

OpenGVLab org 5 days ago

After loading the corresponding task decoder, the model will identify whether the task decoder needs to be called and assist in giving the corresponding response.

willsky

5 days ago

•

edited 5 days ago

Could you give an example code to do this? I just want to get the specific frame number or specific time corresponding to the prompts. For example, "In this video, in which frames does a man appear?" "In this video, from which second to which second does a man appear?" Currently, the demo cannot output the right frames/seconds.

ynhe

OpenGVLab org 4 days ago

You can try this:

Based on the video content, Determine the start and end times of **various activity events** in the video, accompanied by descriptions.

willsky

4 days ago

I have tried this, but it cannot output the right time. For a video of 6 second, it outputs "25 to 30 seconds"

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment