Spaces:

bethgelab
/

lm-similarity

Running

Joschka Strueber commited on 15 days ago

Commit

1b549fb

1 Parent(s): b90e0d3

[Ref] change table size

Files changed (1) hide show

app.py CHANGED Viewed

@@ -169,15 +169,15 @@ for model similarity which adjusts for chance agreement due to accuracy. Using C
 biased towards more similar models controlling for the model's capability. (2) Gain from training strong models on annotations \
 of weak supervisors (weak-to-strong generalization) is higher when the two models are more different. (3) Concerningly, model \
 errors are getting more correlated as capabilities increase.""")
-    image_path = "data/table_capa.png"
-    gr.Image(value=image_path, label="Comparison of different similarity metrics for multiple-choice questions", interactive=False)
     gr.Markdown("""
 - **Datasets**: [Open LLM Leaderboard v2](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/) benchmark datasets \n
     - Some datasets are not multiple-choice - for these, the metrics are not applicable. \n
 - **Models**: Open LLM Leaderboard models \n
     - Every model evaluation is gated on Hugging Face and access has to be requested. \n
     - We requested access for the most popular models, but some may be missing. \n
-    - Notably, loading data is not possible for many meta-llama and gemma models.
 - **Metrics**: CAPA (probabilistic), CAPA (deterministic), Error Consistency""")
 if __name__ == "__main__":

 biased towards more similar models controlling for the model's capability. (2) Gain from training strong models on annotations \
 of weak supervisors (weak-to-strong generalization) is higher when the two models are more different. (3) Concerningly, model \
 errors are getting more correlated as capabilities increase.""")
+    with gr.Row():
+        gr.Image(value="data/table_capa.png", label="Comparison of different similarity metrics for multiple-choice questions", interactive=False, scale=1)
     gr.Markdown("""
 - **Datasets**: [Open LLM Leaderboard v2](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/) benchmark datasets \n
     - Some datasets are not multiple-choice - for these, the metrics are not applicable. \n
 - **Models**: Open LLM Leaderboard models \n
     - Every model evaluation is gated on Hugging Face and access has to be requested. \n
     - We requested access for the most popular models, but some may be missing. \n
+    - Notably, loading data is not possible for some meta-llama and gemma models.
 - **Metrics**: CAPA (probabilistic), CAPA (deterministic), Error Consistency""")
 if __name__ == "__main__":