====== Perplexity statistics ====== Mean PPL(Q) : 10.071679 ± 0.073178 Mean PPL(base) : 10.036835 ± 0.072696 Cor(ln(PPL(Q)), ln(PPL(base))): 99.93% Mean ln(PPL(Q)/PPL(base)) : 0.003466 ± 0.000272 Mean PPL(Q)/PPL(base) : 1.003472 ± 0.000273 Mean PPL(Q)-PPL(base) : 0.034844 ± 0.002769 ====== KL divergence statistics ====== Mean KLD: 0.003501 ± 0.000018 Maximum KLD: 1.326304 99.9% KLD: 0.063485 99.0% KLD: 0.024422 99.0% KLD: 0.024422 Median KLD: 0.002144 10.0% KLD: 0.000053 5.0% KLD: 0.000008 1.0% KLD: -0.000001 Minimum KLD: -0.000235 ====== Token probability statistics ====== Mean Δp: 0.013 ± 0.004 % Maximum Δp: 34.867% 99.9% Δp: 9.405% 99.0% Δp: 4.684% 95.0% Δp: 2.228% 90.0% Δp: 1.306% 75.0% Δp: 0.272% Median Δp: 0.000% 25.0% Δp: -0.249% 10.0% Δp: -1.227% 5.0% Δp: -2.154% 1.0% Δp: -4.685% 0.1% Δp: -10.648% Minimum Δp: -30.875% RMS Δp : 1.532 ± 0.011 % Same top p: 96.908 ± 0.045 %