RaushanTurganbay HF staff commited on
Commit
1be2b8f
·
verified ·
1 Parent(s): d128dc6

Update pipeline example

Browse files
Files changed (1) hide show
  1. README.md +8 -20
README.md CHANGED
@@ -9,7 +9,6 @@ tags:
9
  datasets:
10
  - lmms-lab/LLaVA-OneVision-Data
11
  pipeline_tag: image-text-to-text
12
- inference: false
13
  arxiv: 2408.03326
14
  ---
15
  # LLaVA-Onevision Model Card
@@ -53,37 +52,26 @@ The model supports multi-image and multi-prompt generation. Meaning that you can
53
  ### Using `pipeline`:
54
 
55
  Below we used [`"llava-hf/llava-onevision-qwen2-72b-ov-hf"`](https://huggingface.co/llava-hf/llava-onevision-qwen2-72b-ov-hf) checkpoint.
56
-
57
  ```python
58
- from transformers import pipeline, AutoProcessor
59
- from PIL import Image
60
- import requests
61
-
62
- model_id = "llava-hf/llava-onevision-qwen2-72b-ov-hf"
63
- pipe = pipeline("image-to-text", model=model_id)
64
- processor = AutoProcessor.from_pretrained(model_id)
65
- url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
66
- image = Image.open(requests.get(url, stream=True).raw)
67
 
68
- # Define a chat history and use `apply_chat_template` to get correctly formatted prompt
69
- # Each value in "content" has to be a list of dicts with types ("text", "image")
70
- conversation = [
71
  {
72
-
73
  "role": "user",
74
  "content": [
 
75
  {"type": "text", "text": "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"},
76
- {"type": "image"},
77
  ],
78
  },
79
  ]
80
- prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
81
 
82
- outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
83
- print(outputs)
84
- >>> {"generated_text": "user\n\nWhat does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud\nassistant\nLava"}
85
  ```
86
 
 
87
  ### Using pure `transformers`:
88
 
89
  Below is an example script to run generation in `float16` precision on a GPU device:
 
9
  datasets:
10
  - lmms-lab/LLaVA-OneVision-Data
11
  pipeline_tag: image-text-to-text
 
12
  arxiv: 2408.03326
13
  ---
14
  # LLaVA-Onevision Model Card
 
52
  ### Using `pipeline`:
53
 
54
  Below we used [`"llava-hf/llava-onevision-qwen2-72b-ov-hf"`](https://huggingface.co/llava-hf/llava-onevision-qwen2-72b-ov-hf) checkpoint.
 
55
  ```python
56
+ from transformers import pipeline
 
 
 
 
 
 
 
 
57
 
58
+ pipe = pipeline("image-text-to-text", model="llava-onevision-qwen2-72b-ov-hf")
59
+ messages = [
 
60
  {
 
61
  "role": "user",
62
  "content": [
63
+ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"},
64
  {"type": "text", "text": "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"},
 
65
  ],
66
  },
67
  ]
 
68
 
69
+ out = pipe(text=messages, max_new_tokens=20)
70
+ print(out)
71
+ >>> [{'input_text': [{'role': 'user', 'content': [{'type': 'image', 'url': 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg'}, {'type': 'text', 'text': 'What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud'}]}], 'generated_text': 'Lava'}]
72
  ```
73
 
74
+
75
  ### Using pure `transformers`:
76
 
77
  Below is an example script to run generation in `float16` precision on a GPU device: