kyujinpy commited on
Commit
9ad9c5a
·
verified ·
1 Parent(s): 1c766a4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md CHANGED
@@ -83,7 +83,61 @@ We utilized **gpt-4o-2024-08-06** in `K-LLAVA-W` evaluation.
83
  ### MM Benchmarks
84
  - Global MM Bench dataset: [OpenCampass MM leaderboard](https://rank.opencompass.org.cn/leaderboard-multimodal)
85
  - Korean MM Bench dataset: [NCSOFT](https://huggingface.co/NCSOFT).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
  ## Chat Prompt😶‍🌫️
89
  ```yaml
 
83
  ### MM Benchmarks
84
  - Global MM Bench dataset: [OpenCampass MM leaderboard](https://rank.opencompass.org.cn/leaderboard-multimodal)
85
  - Korean MM Bench dataset: [NCSOFT](https://huggingface.co/NCSOFT).
86
+
87
+ ## Inference
88
+ ```python
89
+ import torch
90
+ from PIL import Image
91
+ from transformers import AutoModelForCausalLM
92
+
93
+ #import os
94
+ #os.environ["cuda_visible_devices"]="0"
95
+
96
+ # load model
97
+ if __name__ == '__main__':
98
+ # HumanF-MarkrAI/Gukbap-Qwen2-34B-VL
99
+ # AIDC-AI/Ovis2-34B
100
+ model = AutoModelForCausalLM.from_pretrained("HumanF-MarkrAI/Gukbap-Qwen2-34B-VL",
101
+ torch_dtype=torch.bfloat16,
102
+ multimodal_max_length=2048,
103
+ cache_dir="/data/cache/",
104
+ trust_remote_code=True).cuda()
105
+ text_tokenizer = model.get_text_tokenizer()
106
+ visual_tokenizer = model.get_visual_tokenizer()
107
 
108
+ # single-image input (K-LLAVA-W)
109
+ image_path = './images/ex_4.jpg'
110
+ images = [Image.open(image_path)]
111
+ max_partition = 9
112
+ text = '이미지에서 잘리지 않은 과일은 몇 개인가요?'
113
+ query = f'<image>\n{text}'
114
+
115
+ # format conversation
116
+ prompt, input_ids, pixel_values = model.preprocess_inputs(query, images, max_partition=max_partition)
117
+ attention_mask = torch.ne(input_ids, text_tokenizer.pad_token_id)
118
+ input_ids = input_ids.unsqueeze(0).to(device=model.device)
119
+ attention_mask = attention_mask.unsqueeze(0).to(device=model.device)
120
+ if pixel_values is not None:
121
+ pixel_values = pixel_values.to(dtype=visual_tokenizer.dtype, device=visual_tokenizer.device)
122
+ pixel_values = [pixel_values]
123
+
124
+ # generate output
125
+ with torch.inference_mode():
126
+ gen_kwargs = dict(
127
+ max_new_tokens=2048,
128
+ do_sample=False,
129
+ top_p=None,
130
+ top_k=None,
131
+ temperature=None,
132
+ repetition_penalty=None,
133
+ eos_token_id=model.generation_config.eos_token_id,
134
+ pad_token_id=text_tokenizer.pad_token_id,
135
+ use_cache=True
136
+ )
137
+ output_ids = model.generate(input_ids, pixel_values=pixel_values, attention_mask=attention_mask, **gen_kwargs)[0]
138
+ output = text_tokenizer.decode(output_ids, skip_special_tokens=True)
139
+ print(f'Output:\n{output}')
140
+ ```
141
 
142
  ## Chat Prompt😶‍🌫️
143
  ```yaml