Update Readme
Browse files
README.md
CHANGED
@@ -409,7 +409,7 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
409 |
device_map="cuda",
|
410 |
torch_dtype="auto",
|
411 |
trust_remote_code=True,
|
412 |
-
# if you do not Ampere or later GPUs, change attention to "eager"
|
413 |
_attn_implementation='flash_attention_2',
|
414 |
).cuda()
|
415 |
|
@@ -573,8 +573,12 @@ The model is licensed under the [MIT license](./LICENSE).
|
|
573 |
## Trademarks
|
574 |
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
|
575 |
|
|
|
576 |
## Appendix A: Benchmark Methodology
|
577 |
|
|
|
|
|
|
|
578 |
We include a brief word on methodology here - and in particular, how we think about optimizing prompts.
|
579 |
In an ideal world, we would never change any prompts in our benchmarks to ensure it is always an apples-to-apples comparison when comparing different models. Indeed, this is our default approach, and is the case in the vast majority of models we have run to date.
|
580 |
There are, however, some exceptions to this. In some cases, we see a model that performs worse than expected on a given eval due to a failure to respect the output format. For example:
|
@@ -650,3 +654,4 @@ The model was evaluated across a breadth of public and internal benchmarks to un
|
|
650 |
+ Toxigen: Toxigen is adversarial and hate speech detection
|
651 |
+ Red Team:
|
652 |
+ Responses to prompts provided by AI Red Team at Microsoft
|
|
|
|
409 |
device_map="cuda",
|
410 |
torch_dtype="auto",
|
411 |
trust_remote_code=True,
|
412 |
+
# if you do not use Ampere or later GPUs, change attention to "eager"
|
413 |
_attn_implementation='flash_attention_2',
|
414 |
).cuda()
|
415 |
|
|
|
573 |
## Trademarks
|
574 |
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
|
575 |
|
576 |
+
|
577 |
## Appendix A: Benchmark Methodology
|
578 |
|
579 |
+
<details>
|
580 |
+
<summary>Click to view detail descriptions</summary>
|
581 |
+
|
582 |
We include a brief word on methodology here - and in particular, how we think about optimizing prompts.
|
583 |
In an ideal world, we would never change any prompts in our benchmarks to ensure it is always an apples-to-apples comparison when comparing different models. Indeed, this is our default approach, and is the case in the vast majority of models we have run to date.
|
584 |
There are, however, some exceptions to this. In some cases, we see a model that performs worse than expected on a given eval due to a failure to respect the output format. For example:
|
|
|
654 |
+ Toxigen: Toxigen is adversarial and hate speech detection
|
655 |
+ Red Team:
|
656 |
+ Responses to prompts provided by AI Red Team at Microsoft
|
657 |
+
</details>
|