====== Perplexity statistics ====== Mean PPL(Q) : 24.806925 ± 0.237610 Mean PPL(base) : 23.352232 ± 0.220841 Cor(ln(PPL(Q)), ln(PPL(base))): 99.16% Mean ln(PPL(Q)/PPL(base)) : 0.060430 ± 0.001239 Mean PPL(Q)/PPL(base) : 1.062293 ± 0.001316 Mean PPL(Q)-PPL(base) : 1.454692 ± 0.034094 ====== KL divergence statistics ====== Mean KLD: 0.063828 ± 0.000254 Maximum KLD: 4.249581 99.9% KLD: 0.920882 99.0% KLD: 0.436885 99.0% KLD: 0.436885 Median KLD: 0.031026 10.0% KLD: 0.000750 5.0% KLD: 0.000177 1.0% KLD: 0.000000 Minimum KLD: -0.000455 ====== Token probability statistics ====== Mean Δp: -0.257 ± 0.016 % Maximum Δp: 73.489% 99.9% Δp: 36.078% 99.0% Δp: 19.828% 95.0% Δp: 8.789% 90.0% Δp: 4.486% 75.0% Δp: 0.496% Median Δp: -0.000% 25.0% Δp: -0.687% 10.0% Δp: -5.257% 5.0% Δp: -10.230% 1.0% Δp: -22.714% 0.1% Δp: -43.074% Minimum Δp: -90.990% RMS Δp : 6.411 ± 0.033 % Same top p: 88.802 ± 0.081 % llama_perf_context_print: load time = 87933.48 ms llama_perf_context_print: prompt eval time = 1845252.27 ms / 304128 tokens ( 6.07 ms per token, 164.82 tokens per second) llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) llama_perf_context_print: total time = 1979075.22 ms / 304129 tokens