Joschka Strueber commited on
Commit
5623280
·
1 Parent(s): 69fd3ae

[Ref] switch from mathjax in markdown to html block

Browse files
Files changed (1) hide show
  1. app.py +8 -2
app.py CHANGED
@@ -78,11 +78,17 @@ with gr.Blocks(title="LLM Similarity Analyzer", css=app_util.custom_css) as demo
78
  )
79
 
80
  gr.Markdown("## Information")
81
- gr.Markdown(r"""We propose Chance Adjusted Probabilistic Agreement (\(\operatorname{CAPA}\), or \(\kappa_p\)), a novel metric \
 
 
 
 
 
82
  for model similarity which adjusts for chance agreement due to accuracy. Using CAPA, we find: (1) LLM-as-a-judge scores are \
83
  biased towards more similar models controlling for the model's capability. (2) Gain from training strong models on annotations \
84
  of weak supervisors (weak-to-strong generalization) is higher when the two models are more different. (3) Concerningly, model \
85
- errors are getting more correlated as capabilities increase.""")
 
86
  with gr.Row():
87
  gr.Image(value="data/table_capa.png", label="Comparison of different similarity metrics for multiple-choice questions", elem_classes="image_container", interactive=False)
88
  gr.Markdown("""
 
78
  )
79
 
80
  gr.Markdown("## Information")
81
+ gr.HTML("""
82
+ <script type="text/javascript" async
83
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML">
84
+ </script>
85
+
86
+ <p>We propose Chance Adjusted Probabilistic Agreement (<span>\(\operatorname{CAPA}\)</span>, or <span>\(\kappa_p\)</span>), a novel metric
87
  for model similarity which adjusts for chance agreement due to accuracy. Using CAPA, we find: (1) LLM-as-a-judge scores are \
88
  biased towards more similar models controlling for the model's capability. (2) Gain from training strong models on annotations \
89
  of weak supervisors (weak-to-strong generalization) is higher when the two models are more different. (3) Concerningly, model \
90
+ errors are getting more correlated as capabilities increase.</p>
91
+ """)
92
  with gr.Row():
93
  gr.Image(value="data/table_capa.png", label="Comparison of different similarity metrics for multiple-choice questions", elem_classes="image_container", interactive=False)
94
  gr.Markdown("""