====== Perplexity statistics ====== Mean PPL(Q) : 39.396539 ± 0.409082 Mean PPL(base) : 23.352232 ± 0.220841 Cor(ln(PPL(Q)), ln(PPL(base))): 94.50% Mean ln(PPL(Q)/PPL(base)) : 0.522985 ± 0.003415 Mean PPL(Q)/PPL(base) : 1.687057 ± 0.005762 Mean PPL(Q)-PPL(base) : 16.044307 ± 0.213016 ====== KL divergence statistics ====== Mean KLD: 0.448856 ± 0.001660 Maximum KLD: 11.228906 99.9% KLD: 5.432167 99.0% KLD: 3.097294 99.0% KLD: 3.097294 Median KLD: 0.213731 10.0% KLD: 0.005922 5.0% KLD: 0.001612 1.0% KLD: 0.000166 Minimum KLD: -0.000425 ====== Token probability statistics ====== Mean Δp: -2.648 ± 0.043 % Maximum Δp: 96.523% 99.9% Δp: 66.438% 99.0% Δp: 40.723% 95.0% Δp: 18.234% 90.0% Δp: 8.706% 75.0% Δp: 0.595% Median Δp: -0.023% 25.0% Δp: -2.797% 10.0% Δp: -17.572% 5.0% Δp: -33.515% 1.0% Δp: -71.845% 0.1% Δp: -94.533% Minimum Δp: -99.575% RMS Δp : 16.773 ± 0.071 % Same top p: 74.363 ± 0.112 % llama_perf_context_print: load time = 79800.46 ms llama_perf_context_print: prompt eval time = 1798108.56 ms / 304128 tokens ( 5.91 ms per token, 169.14 tokens per second) llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) llama_perf_context_print: total time = 1960563.16 ms / 304129 tokens