Update readme
Browse files
README.md
CHANGED
@@ -46,6 +46,7 @@ widget:
|
|
46 |
- role: user
|
47 |
content: Can you provide ways to eat combinations of bananas and dragonfruits?
|
48 |
library_name: transformers
|
|
|
49 |
---
|
50 |
|
51 |
## Model Summary
|
@@ -407,8 +408,9 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
407 |
model_path,
|
408 |
device_map="cuda",
|
409 |
torch_dtype="auto",
|
410 |
-
trust_remote_code=True,
|
411 |
-
|
|
|
412 |
).cuda()
|
413 |
|
414 |
# Load generation config
|
@@ -466,6 +468,8 @@ response = processor.batch_decode(
|
|
466 |
print(f'>>> Response\n{response}')
|
467 |
```
|
468 |
|
|
|
|
|
469 |
## Responsible AI Considerations
|
470 |
|
471 |
Like other language models, the Phi family of models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include:
|
@@ -561,7 +565,7 @@ Note that by default, the Phi-4-multimodal-instruct model uses flash attention,
|
|
561 |
* NVIDIA H100
|
562 |
|
563 |
If you want to run the model on:
|
564 |
-
* NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with
|
565 |
|
566 |
## License
|
567 |
The model is licensed under the [MIT license](./LICENSE).
|
|
|
46 |
- role: user
|
47 |
content: Can you provide ways to eat combinations of bananas and dragonfruits?
|
48 |
library_name: transformers
|
49 |
+
paper: arxiv.org/abs/2503.01743
|
50 |
---
|
51 |
|
52 |
## Model Summary
|
|
|
408 |
model_path,
|
409 |
device_map="cuda",
|
410 |
torch_dtype="auto",
|
411 |
+
trust_remote_code=True,
|
412 |
+
# if you do not Ampere or later GPUs, change attention to "eager"
|
413 |
+
_attn_implementation='flash_attention_2',
|
414 |
).cuda()
|
415 |
|
416 |
# Load generation config
|
|
|
468 |
print(f'>>> Response\n{response}')
|
469 |
```
|
470 |
|
471 |
+
**Notes**:
|
472 |
+
|
473 |
## Responsible AI Considerations
|
474 |
|
475 |
Like other language models, the Phi family of models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include:
|
|
|
565 |
* NVIDIA H100
|
566 |
|
567 |
If you want to run the model on:
|
568 |
+
* NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with _attn_implementation="eager"
|
569 |
|
570 |
## License
|
571 |
The model is licensed under the [MIT license](./LICENSE).
|