====== Perplexity statistics ====== Mean PPL(Q) : 8.286801 ± 0.058783 Mean PPL(base) : 7.669212 ± 0.052592 Cor(ln(PPL(Q)), ln(PPL(base))): 98.72% Mean ln(PPL(Q)/PPL(base)) : 0.077450 ± 0.001141 Mean PPL(Q)/PPL(base) : 1.080528 ± 0.001233 Mean PPL(Q)-PPL(base) : 0.617589 ± 0.010843 ====== KL divergence statistics ====== Mean KLD: 0.061737 ± 0.000316 Maximum KLD: 6.420732 99.9% KLD: 1.478346 99.0% KLD: 0.532294 99.0% KLD: 0.532294 Median KLD: 0.029140 10.0% KLD: 0.000817 5.0% KLD: 0.000195 1.0% KLD: 0.000021 Minimum KLD: -0.000001 ====== Token probability statistics ====== Mean Δp: -0.421 ± 0.018 % Maximum Δp: 96.151% 99.9% Δp: 38.523% 99.0% Δp: 18.770% 95.0% Δp: 8.780% 90.0% Δp: 5.065% 75.0% Δp: 0.992% Median Δp: -0.007% 25.0% Δp: -1.428% 10.0% Δp: -6.125% 5.0% Δp: -10.610% 1.0% Δp: -24.391% 0.1% Δp: -54.028% Minimum Δp: -92.165% RMS Δp : 6.903 ± 0.041 % Same top p: 89.261 ± 0.080 %