====== Perplexity statistics ====== Mean PPL(Q) : 23.519878 ± 0.223215 Mean PPL(base) : 23.352232 ± 0.220841 Cor(ln(PPL(Q)), ln(PPL(base))): 99.96% Mean ln(PPL(Q)/PPL(base)) : 0.007153 ± 0.000271 Mean PPL(Q)/PPL(base) : 1.007179 ± 0.000273 Mean PPL(Q)-PPL(base) : 0.167646 ± 0.006738 ====== KL divergence statistics ====== Mean KLD: 0.002318 ± 0.000013 Maximum KLD: 0.848010 99.9% KLD: 0.043823 99.0% KLD: 0.015801 99.0% KLD: 0.015801 Median KLD: 0.001166 10.0% KLD: 0.000025 5.0% KLD: -0.000001 1.0% KLD: -0.000063 Minimum KLD: -0.000401 ====== Token probability statistics ====== Mean Δp: 0.006 ± 0.003 % Maximum Δp: 26.906% 99.9% Δp: 7.636% 99.0% Δp: 4.056% 95.0% Δp: 1.919% 90.0% Δp: 0.994% 75.0% Δp: 0.127% Median Δp: -0.000% 25.0% Δp: -0.119% 10.0% Δp: -0.958% 5.0% Δp: -1.871% 1.0% Δp: -4.090% 0.1% Δp: -7.817% Minimum Δp: -20.640% RMS Δp : 1.266 ± 0.008 % Same top p: 97.778 ± 0.038 % llama_perf_context_print: load time = 3238.55 ms llama_perf_context_print: prompt eval time = 2842839.16 ms / 304128 tokens ( 9.35 ms per token, 106.98 tokens per second) llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) llama_perf_context_print: total time = 2975924.16 ms / 304129 tokens