====== Perplexity statistics ====== Mean PPL(Q) : 8.471756 ± 0.060877 Mean PPL(base) : 7.669212 ± 0.052592 Cor(ln(PPL(Q)), ln(PPL(base))): 97.94% Mean ln(PPL(Q)/PPL(base)) : 0.099524 ± 0.001461 Mean PPL(Q)/PPL(base) : 1.104645 ± 0.001613 Mean PPL(Q)-PPL(base) : 0.802544 ± 0.014151 ====== KL divergence statistics ====== Mean KLD: 0.095982 ± 0.000509 Maximum KLD: 7.754703 99.9% KLD: 2.322698 99.0% KLD: 0.853306 99.0% KLD: 0.853306 Median KLD: 0.043948 10.0% KLD: 0.001228 5.0% KLD: 0.000275 1.0% KLD: 0.000029 Minimum KLD: -0.000009 ====== Token probability statistics ====== Mean Δp: -0.524 ± 0.023 % Maximum Δp: 86.466% 99.9% Δp: 45.371% 99.0% Δp: 23.437% 95.0% Δp: 11.093% 90.0% Δp: 6.533% 75.0% Δp: 1.323% Median Δp: -0.006% 25.0% Δp: -1.661% 10.0% Δp: -7.499% 5.0% Δp: -13.316% 1.0% Δp: -33.083% 0.1% Δp: -70.135% Minimum Δp: -99.502% RMS Δp : 8.838 ± 0.050 % Same top p: 86.882 ± 0.087 %