omrialmog commited on
Commit
921f249
·
verified ·
1 Parent(s): 2325491

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -19
README.md CHANGED
@@ -1,6 +1,9 @@
1
  ---
2
  base_model:
3
  - meta-llama/Llama-3.1-405B-Instruct
 
 
 
4
  ---
5
  # Model Overview
6
 
@@ -77,39 +80,35 @@ python examples/llama/convert_checkpoint.py --model_dir Llama-3.1-405B-Instruct-
77
  trtllm-build --checkpoint_dir /ckpt --output_dir /engine
78
  ```
79
 
80
- * Accuracy evaluation:
81
-
82
- 1) Prepare the MMLU dataset:
83
- ```sh
84
- mkdir data; wget https://people.eecs.berkeley.edu/~hendrycks/data.tar -O data/mmlu.tar
85
- tar -xf data/mmlu.tar -C data && mv data/data data/mmlu
86
- ```
87
-
88
- 2) Measure MMLU:
89
-
90
- ```sh
91
- python examples/mmlu.py --engine_dir ./engine --tokenizer_dir Llama-3.1-405B-Instruct-FP8/ --test_trt_llm --data_dir data/mmlu
92
- ```
93
-
94
  * Throughputs evaluation:
95
 
96
  Please refer to the [TensorRT-LLM benchmarking documentation](https://github.com/NVIDIA/TensorRT-LLM/blob/main/benchmarks/Suite.md) for details.
97
 
98
  #### Evaluation
99
- The accuracy (MMLU, 5-shot) and throughputs (tokens per second, TPS) benchmark results are presented in the table below:
100
  <table>
101
  <tr>
102
  <td><strong>Precision</strong>
103
  </td>
104
  <td><strong>MMLU</strong>
105
  </td>
106
- <td><strong>TPS</strong>
 
 
 
 
107
  </td>
108
  </tr>
109
  <tr>
110
- <td>FP16
 
 
 
 
111
  </td>
112
- <td>86.6
 
 
113
  </td>
114
  <td>275.0
115
  </td>
@@ -117,7 +116,13 @@ The accuracy (MMLU, 5-shot) and throughputs (tokens per second, TPS) benchmark r
117
  <tr>
118
  <td>FP8
119
  </td>
120
- <td>86.2
 
 
 
 
 
 
121
  </td>
122
  <td>469.78
123
  </td>
 
1
  ---
2
  base_model:
3
  - meta-llama/Llama-3.1-405B-Instruct
4
+ license: llama3.1
5
+ pipeline_tag: text-generation
6
+ library_name: transformers
7
  ---
8
  # Model Overview
9
 
 
80
  trtllm-build --checkpoint_dir /ckpt --output_dir /engine
81
  ```
82
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
  * Throughputs evaluation:
84
 
85
  Please refer to the [TensorRT-LLM benchmarking documentation](https://github.com/NVIDIA/TensorRT-LLM/blob/main/benchmarks/Suite.md) for details.
86
 
87
  #### Evaluation
88
+
89
  <table>
90
  <tr>
91
  <td><strong>Precision</strong>
92
  </td>
93
  <td><strong>MMLU</strong>
94
  </td>
95
+ <td><strong>GSM8K (CoT) </strong>
96
+ </td>
97
+ <td><strong>ARC Challenge</strong>
98
+ </td>
99
+ <td><strong>IFEVAL</strong>
100
  </td>
101
  </tr>
102
  <tr>
103
+ <td>BF16
104
+ </td>
105
+ <td>87.6
106
+ </td>
107
+ <td>96.3
108
  </td>
109
+ <td>96.9
110
+ </td>
111
+ <td>90.3
112
  </td>
113
  <td>275.0
114
  </td>
 
116
  <tr>
117
  <td>FP8
118
  </td>
119
+ <td>87.4
120
+ </td>
121
+ <td>96.2
122
+ </td>
123
+ <td>96.4
124
+ </td>
125
+ <td>90.4
126
  </td>
127
  <td>469.78
128
  </td>