Spaces:

bethgelab
/

lm-similarity

Running

Joschka Strueber commited on 15 days ago

Commit

5623280

1 Parent(s): 69fd3ae

[Ref] switch from mathjax in markdown to html block

Files changed (1) hide show

app.py CHANGED Viewed

@@ -78,11 +78,17 @@ with gr.Blocks(title="LLM Similarity Analyzer", css=app_util.custom_css) as demo
     )
     gr.Markdown("## Information")
-    gr.Markdown(r"""We propose Chance Adjusted Probabilistic Agreement (\(\operatorname{CAPA}\), or \(\kappa_p\)), a novel metric \
 for model similarity which adjusts for chance agreement due to accuracy. Using CAPA, we find: (1) LLM-as-a-judge scores are \
 biased towards more similar models controlling for the model's capability. (2) Gain from training strong models on annotations \
 of weak supervisors (weak-to-strong generalization) is higher when the two models are more different. (3) Concerningly, model \
-errors are getting more correlated as capabilities increase.""")
     with gr.Row():
         gr.Image(value="data/table_capa.png", label="Comparison of different similarity metrics for multiple-choice questions", elem_classes="image_container", interactive=False)
     gr.Markdown("""

     )
     gr.Markdown("## Information")
+    gr.HTML("""
+<script type="text/javascript" async
+  src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML">
+</script>
+<p>We propose Chance Adjusted Probabilistic Agreement (<span>\(\operatorname{CAPA}\)</span>, or <span>\(\kappa_p\)</span>), a novel metric
 for model similarity which adjusts for chance agreement due to accuracy. Using CAPA, we find: (1) LLM-as-a-judge scores are \
 biased towards more similar models controlling for the model's capability. (2) Gain from training strong models on annotations \
 of weak supervisors (weak-to-strong generalization) is higher when the two models are more different. (3) Concerningly, model \
+errors are getting more correlated as capabilities increase.</p>
+""")
     with gr.Row():
         gr.Image(value="data/table_capa.png", label="Comparison of different similarity metrics for multiple-choice questions", elem_classes="image_container", interactive=False)
     gr.Markdown("""