====== Perplexity statistics ====== Mean PPL(Q) : 7.996693 ± 0.049993 Mean PPL(base) : 6.554978 ± 0.040159 Cor(ln(PPL(Q)), ln(PPL(base))): 95.62% Mean ln(PPL(Q)/PPL(base)) : 0.198803 ± 0.001836 Mean PPL(Q)/PPL(base) : 1.219942 ± 0.002240 Mean PPL(Q)-PPL(base) : 1.441714 ± 0.016513 ====== KL divergence statistics ====== Mean KLD: 0.192568 ± 0.000912 Maximum KLD: 12.200028 99.9% KLD: 4.191668 99.0% KLD: 1.582829 99.0% KLD: 1.582829 Median KLD: 0.105043 10.0% KLD: 0.007682 5.0% KLD: 0.002102 1.0% KLD: 0.000287 Minimum KLD: -0.000001 ====== Token probability statistics ====== Mean Δp: -4.848 ± 0.036 % Maximum Δp: 85.850% 99.9% Δp: 43.373% 99.0% Δp: 21.875% 95.0% Δp: 8.624% 90.0% Δp: 3.933% 75.0% Δp: 0.163% Median Δp: -0.894% 25.0% Δp: -7.011% 10.0% Δp: -18.152% 5.0% Δp: -29.707% 1.0% Δp: -62.716% 0.1% Δp: -91.337% Minimum Δp: -99.809% RMS Δp : 14.337 ± 0.067 % Same top p: 80.175 ± 0.105 %