====== Perplexity statistics ====== Mean PPL(Q) : 7.301537 ± 0.044767 Mean PPL(base) : 6.554978 ± 0.040159 Cor(ln(PPL(Q)), ln(PPL(base))): 97.66% Mean ln(PPL(Q)/PPL(base)) : 0.107860 ± 0.001326 Mean PPL(Q)/PPL(base) : 1.113892 ± 0.001477 Mean PPL(Q)-PPL(base) : 0.746559 ± 0.010263 ====== KL divergence statistics ====== Mean KLD: 0.108870 ± 0.000594 Maximum KLD: 7.075473 99.9% KLD: 2.822138 99.0% KLD: 0.979758 99.0% KLD: 0.979758 Median KLD: 0.057728 10.0% KLD: 0.004307 5.0% KLD: 0.001381 1.0% KLD: 0.000211 Minimum KLD: 0.000001 ====== Token probability statistics ====== Mean Δp: -3.144 ± 0.025 % Maximum Δp: 80.109% 99.9% Δp: 34.323% 99.0% Δp: 15.825% 95.0% Δp: 6.439% 90.0% Δp: 2.991% 75.0% Δp: 0.136% Median Δp: -0.643% 25.0% Δp: -4.764% 10.0% Δp: -11.960% 5.0% Δp: -18.526% 1.0% Δp: -42.905% 0.1% Δp: -83.187% Minimum Δp: -99.496% RMS Δp : 10.140 ± 0.061 % Same top p: 85.355 ± 0.093 %