====== Perplexity statistics ====== Mean PPL(Q) : 10.401451 ± 0.075250 Mean PPL(base) : 10.036835 ± 0.072696 Cor(ln(PPL(Q)), ln(PPL(base))): 99.14% Mean ln(PPL(Q)/PPL(base)) : 0.035684 ± 0.000949 Mean PPL(Q)/PPL(base) : 1.036328 ± 0.000983 Mean PPL(Q)-PPL(base) : 0.364616 ± 0.010024 ====== KL divergence statistics ====== Mean KLD: 0.050746 ± 0.000215 Maximum KLD: 3.334370 99.9% KLD: 0.923916 99.0% KLD: 0.376239 99.0% KLD: 0.376239 Median KLD: 0.030081 10.0% KLD: 0.001119 5.0% KLD: 0.000250 1.0% KLD: 0.000020 Minimum KLD: -0.000007 ====== Token probability statistics ====== Mean Δp: -0.975 ± 0.015 % Maximum Δp: 72.807% 99.9% Δp: 31.246% 99.0% Δp: 14.309% 95.0% Δp: 6.030% 90.0% Δp: 3.075% 75.0% Δp: 0.343% Median Δp: -0.055% 25.0% Δp: -1.867% 10.0% Δp: -6.234% 5.0% Δp: -10.091% 1.0% Δp: -21.519% 0.1% Δp: -44.651% Minimum Δp: -88.497% RMS Δp : 5.862 ± 0.035 % Same top p: 89.036 ± 0.081 %