Spaces:
Running
Running
Joschka Strueber
commited on
Commit
·
5623280
1
Parent(s):
69fd3ae
[Ref] switch from mathjax in markdown to html block
Browse files
app.py
CHANGED
@@ -78,11 +78,17 @@ with gr.Blocks(title="LLM Similarity Analyzer", css=app_util.custom_css) as demo
|
|
78 |
)
|
79 |
|
80 |
gr.Markdown("## Information")
|
81 |
-
gr.
|
|
|
|
|
|
|
|
|
|
|
82 |
for model similarity which adjusts for chance agreement due to accuracy. Using CAPA, we find: (1) LLM-as-a-judge scores are \
|
83 |
biased towards more similar models controlling for the model's capability. (2) Gain from training strong models on annotations \
|
84 |
of weak supervisors (weak-to-strong generalization) is higher when the two models are more different. (3) Concerningly, model \
|
85 |
-
errors are getting more correlated as capabilities increase
|
|
|
86 |
with gr.Row():
|
87 |
gr.Image(value="data/table_capa.png", label="Comparison of different similarity metrics for multiple-choice questions", elem_classes="image_container", interactive=False)
|
88 |
gr.Markdown("""
|
|
|
78 |
)
|
79 |
|
80 |
gr.Markdown("## Information")
|
81 |
+
gr.HTML("""
|
82 |
+
<script type="text/javascript" async
|
83 |
+
src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML">
|
84 |
+
</script>
|
85 |
+
|
86 |
+
<p>We propose Chance Adjusted Probabilistic Agreement (<span>\(\operatorname{CAPA}\)</span>, or <span>\(\kappa_p\)</span>), a novel metric
|
87 |
for model similarity which adjusts for chance agreement due to accuracy. Using CAPA, we find: (1) LLM-as-a-judge scores are \
|
88 |
biased towards more similar models controlling for the model's capability. (2) Gain from training strong models on annotations \
|
89 |
of weak supervisors (weak-to-strong generalization) is higher when the two models are more different. (3) Concerningly, model \
|
90 |
+
errors are getting more correlated as capabilities increase.</p>
|
91 |
+
""")
|
92 |
with gr.Row():
|
93 |
gr.Image(value="data/table_capa.png", label="Comparison of different similarity metrics for multiple-choice questions", elem_classes="image_container", interactive=False)
|
94 |
gr.Markdown("""
|