====== Perplexity statistics ====== Mean PPL(Q) : 7.731911 ± 0.053337 Mean PPL(base) : 7.669212 ± 0.052592 Cor(ln(PPL(Q)), ln(PPL(base))): 99.83% Mean ln(PPL(Q)/PPL(base)) : 0.008142 ± 0.000406 Mean PPL(Q)/PPL(base) : 1.008175 ± 0.000409 Mean PPL(Q)-PPL(base) : 0.062699 ± 0.003197 ====== KL divergence statistics ====== Mean KLD: 0.006765 ± 0.000058 Maximum KLD: 6.814367 99.9% KLD: 0.159096 99.0% KLD: 0.053242 99.0% KLD: 0.053242 Median KLD: 0.003535 10.0% KLD: 0.000089 5.0% KLD: 0.000019 1.0% KLD: 0.000001 Minimum KLD: -0.000145 ====== Token probability statistics ====== Mean Δp: -0.052 ± 0.006 % Maximum Δp: 66.442% 99.9% Δp: 15.479% 99.0% Δp: 7.070% 95.0% Δp: 3.352% 90.0% Δp: 1.959% 75.0% Δp: 0.414% Median Δp: 0.000% 25.0% Δp: -0.457% 10.0% Δp: -2.146% 5.0% Δp: -3.636% 1.0% Δp: -7.566% 0.1% Δp: -16.327% Minimum Δp: -45.875% RMS Δp : 2.437 ± 0.018 % Same top p: 95.936 ± 0.051 %