Dolphin3.0-R1-Mistral-24B-GGUF / scores /Dolphin3.0-R1-Mistral-24B-IQ3_S.log
eaddario's picture
Generate perplexity and kld scores
1f75c4c
====== Perplexity statistics ======
Mean PPL(Q) : 26.143038 ± 0.254443
Mean PPL(base) : 23.352232 ± 0.220841
Cor(ln(PPL(Q)), ln(PPL(base))): 98.95%
Mean ln(PPL(Q)/PPL(base)) : 0.112890 ± 0.001420
Mean PPL(Q)/PPL(base) : 1.119509 ± 0.001590
Mean PPL(Q)-PPL(base) : 2.790806 ± 0.048102
====== KL divergence statistics ======
Mean KLD: 0.082486 ± 0.000324
Maximum KLD: 8.223216
99.9% KLD: 1.163796
99.0% KLD: 0.554553
99.0% KLD: 0.554553
Median KLD: 0.040744
10.0% KLD: 0.000979
5.0% KLD: 0.000241
1.0% KLD: 0.000008
Minimum KLD: -0.000550
====== Token probability statistics ======
Mean Δp: -0.290 ± 0.018 %
Maximum Δp: 83.668%
99.9% Δp: 39.492%
99.0% Δp: 22.349%
95.0% Δp: 10.000%
90.0% Δp: 4.974%
75.0% Δp: 0.538%
Median Δp: -0.001%
25.0% Δp: -0.805%
10.0% Δp: -5.962%
5.0% Δp: -11.458%
1.0% Δp: -25.491%
0.1% Δp: -47.606%
Minimum Δp: -94.438%
RMS Δp : 7.163 ± 0.036 %
Same top p: 87.635 ± 0.085 %
llama_perf_context_print: load time = 80439.51 ms
llama_perf_context_print: prompt eval time = 1705180.19 ms / 304128 tokens ( 5.61 ms per token, 178.36 tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 1852832.45 ms / 304129 tokens