nvidia
/

Llama-3.1-405B-Instruct-FP8

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

omrialmog commited on 17 days ago

Commit

921f249

·

verified ·

1 Parent(s): 2325491

Update README.md

Files changed (1) hide show

README.md +24 -19

README.md CHANGED Viewed

@@ -1,6 +1,9 @@
 ---
 base_model:
 - meta-llama/Llama-3.1-405B-Instruct
 ---
 # Model Overview
@@ -77,39 +80,35 @@ python examples/llama/convert_checkpoint.py --model_dir Llama-3.1-405B-Instruct-
 trtllm-build --checkpoint_dir /ckpt --output_dir /engine
 ```
-* Accuracy evaluation:
-1) Prepare the MMLU dataset:
-```sh
-mkdir data; wget https://people.eecs.berkeley.edu/~hendrycks/data.tar -O data/mmlu.tar
-tar -xf data/mmlu.tar -C data && mv data/data data/mmlu
-```
-2) Measure MMLU:
-```sh
-python examples/mmlu.py --engine_dir ./engine --tokenizer_dir Llama-3.1-405B-Instruct-FP8/ --test_trt_llm --data_dir data/mmlu
-```
 * Throughputs evaluation:
 Please refer to the [TensorRT-LLM benchmarking documentation](https://github.com/NVIDIA/TensorRT-LLM/blob/main/benchmarks/Suite.md) for details.
 #### Evaluation
-The accuracy (MMLU, 5-shot) and throughputs (tokens per second, TPS) benchmark results are presented in the table below:
 <table>
   <tr>
    <td><strong>Precision</strong>
    </td>
    <td><strong>MMLU</strong>
    </td>
-   <td><strong>TPS</strong>
    </td>
   </tr>
   <tr>
-   <td>FP16
    </td>
-   <td>86.6
    </td>
    <td>275.0
    </td>
@@ -117,7 +116,13 @@ The accuracy (MMLU, 5-shot) and throughputs (tokens per second, TPS) benchmark r
   <tr>
    <td>FP8
    </td>
-   <td>86.2
    </td>
    <td>469.78
    </td>

 ---
 base_model:
 - meta-llama/Llama-3.1-405B-Instruct
+license: llama3.1
+pipeline_tag: text-generation
+library_name: transformers
 ---
 # Model Overview
 trtllm-build --checkpoint_dir /ckpt --output_dir /engine
 ```
 * Throughputs evaluation:
 Please refer to the [TensorRT-LLM benchmarking documentation](https://github.com/NVIDIA/TensorRT-LLM/blob/main/benchmarks/Suite.md) for details.
 #### Evaluation
 <table>
   <tr>
    <td><strong>Precision</strong>
    </td>
    <td><strong>MMLU</strong>
    </td>
+   <td><strong>GSM8K (CoT) </strong>
+   </td>
+   <td><strong>ARC Challenge</strong>
+   </td>
+   <td><strong>IFEVAL</strong>
    </td>
   </tr>
   <tr>
+   <td>BF16
+   </td>
+   <td>87.6
+   </td>
+   <td>96.3
    </td>
+   <td>96.9
+   </td>
+   <td>90.3
    </td>
    <td>275.0
    </td>
   <tr>
    <td>FP8
    </td>
+   <td>87.4
+   </td>
+   <td>96.2
+   </td>
+   <td>96.4
+   </td>
+   <td>90.4
    </td>
    <td>469.78
    </td>