====== Perplexity statistics ====== Mean PPL(Q) : 7.815683 ± 0.054370 Mean PPL(base) : 7.669212 ± 0.052592 Cor(ln(PPL(Q)), ln(PPL(base))): 99.64% Mean ln(PPL(Q)/PPL(base)) : 0.018918 ± 0.000594 Mean PPL(Q)/PPL(base) : 1.019099 ± 0.000606 Mean PPL(Q)-PPL(base) : 0.146471 ± 0.004873 ====== KL divergence statistics ====== Mean KLD: 0.015912 ± 0.000100 Maximum KLD: 7.292192 99.9% KLD: 0.409283 99.0% KLD: 0.134662 99.0% KLD: 0.134662 Median KLD: 0.007713 10.0% KLD: 0.000195 5.0% KLD: 0.000042 1.0% KLD: 0.000003 Minimum KLD: -0.000041 ====== Token probability statistics ====== Mean Δp: 0.068 ± 0.009 % Maximum Δp: 66.453% 99.9% Δp: 21.646% 99.0% Δp: 10.534% 95.0% Δp: 5.254% 90.0% Δp: 3.187% 75.0% Δp: 0.752% Median Δp: 0.001% 25.0% Δp: -0.518% 10.0% Δp: -2.750% 5.0% Δp: -4.853% 1.0% Δp: -11.600% 0.1% Δp: -27.299% Minimum Δp: -69.966% RMS Δp : 3.642 ± 0.025 % Same top p: 94.325 ± 0.059 %