Trained Weights(Model Card) of GPT4Scene

🏠 Overview

This dataset card is for the GPT4Scene project. You can see the more information below.

πŸ€— Hugging Face

Function Huggingface Link
Validation Dataset alexzyqi/GPT4Scene-Val-Dataset
Validation Annotations alexzyqi/GPT4Scene-Val-Annotation
Pretrain Models Qwen/Qwen2-VL-7B-Instruct
Trained Weights alexzyqi/GPT4Scene-qwen2vl_full_sft_mark_32_3D_img512

βš–οΈ License

This repository is licensed under the Apache-2.0.

This repo benefits from LLaMA-Factory, Chat-Scene. Thanks for their wonderful works.

πŸ”— Citation

If this work is helpful, please kindly cite as:

@article{GPT4Scene,
  title={GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models},
  author={Zhangyang Qi and Zhixiong Zhang and Ye Fang and Jiaqi Wang and Hengshuang Zhao},
  journal={arXiv:2501.01428},
  year={2025}
}
Downloads last month
101
Safetensors
Model size
8.29B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.