====== Perplexity statistics ====== Mean PPL(Q) : 7.713033 ± 0.053233 Mean PPL(base) : 7.669212 ± 0.052592 Cor(ln(PPL(Q)), ln(PPL(base))): 99.84% Mean ln(PPL(Q)/PPL(base)) : 0.005698 ± 0.000390 Mean PPL(Q)/PPL(base) : 1.005714 ± 0.000393 Mean PPL(Q)-PPL(base) : 0.043821 ± 0.003051 ====== KL divergence statistics ====== Mean KLD: 0.006054 ± 0.000033 Maximum KLD: 1.568341 99.9% KLD: 0.140650 99.0% KLD: 0.049664 99.0% KLD: 0.049664 Median KLD: 0.003195 10.0% KLD: 0.000079 5.0% KLD: 0.000016 1.0% KLD: 0.000001 Minimum KLD: -0.000192 ====== Token probability statistics ====== Mean Δp: 0.010 ± 0.006 % Maximum Δp: 62.592% 99.9% Δp: 14.978% 99.0% Δp: 6.898% 95.0% Δp: 3.328% 90.0% Δp: 1.972% 75.0% Δp: 0.435% Median Δp: 0.000% 25.0% Δp: -0.398% 10.0% Δp: -1.946% 5.0% Δp: -3.328% 1.0% Δp: -6.905% 0.1% Δp: -15.405% Minimum Δp: -45.849% RMS Δp : 2.314 ± 0.016 % Same top p: 96.240 ± 0.049 %