Joschka Strueber commited on
Commit
1b549fb
·
1 Parent(s): b90e0d3

[Ref] change table size

Browse files
Files changed (1) hide show
  1. app.py +3 -3
app.py CHANGED
@@ -169,15 +169,15 @@ for model similarity which adjusts for chance agreement due to accuracy. Using C
169
  biased towards more similar models controlling for the model's capability. (2) Gain from training strong models on annotations \
170
  of weak supervisors (weak-to-strong generalization) is higher when the two models are more different. (3) Concerningly, model \
171
  errors are getting more correlated as capabilities increase.""")
172
- image_path = "data/table_capa.png"
173
- gr.Image(value=image_path, label="Comparison of different similarity metrics for multiple-choice questions", interactive=False)
174
  gr.Markdown("""
175
  - **Datasets**: [Open LLM Leaderboard v2](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/) benchmark datasets \n
176
  - Some datasets are not multiple-choice - for these, the metrics are not applicable. \n
177
  - **Models**: Open LLM Leaderboard models \n
178
  - Every model evaluation is gated on Hugging Face and access has to be requested. \n
179
  - We requested access for the most popular models, but some may be missing. \n
180
- - Notably, loading data is not possible for many meta-llama and gemma models.
181
  - **Metrics**: CAPA (probabilistic), CAPA (deterministic), Error Consistency""")
182
 
183
  if __name__ == "__main__":
 
169
  biased towards more similar models controlling for the model's capability. (2) Gain from training strong models on annotations \
170
  of weak supervisors (weak-to-strong generalization) is higher when the two models are more different. (3) Concerningly, model \
171
  errors are getting more correlated as capabilities increase.""")
172
+ with gr.Row():
173
+ gr.Image(value="data/table_capa.png", label="Comparison of different similarity metrics for multiple-choice questions", interactive=False, scale=1)
174
  gr.Markdown("""
175
  - **Datasets**: [Open LLM Leaderboard v2](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/) benchmark datasets \n
176
  - Some datasets are not multiple-choice - for these, the metrics are not applicable. \n
177
  - **Models**: Open LLM Leaderboard models \n
178
  - Every model evaluation is gated on Hugging Face and access has to be requested. \n
179
  - We requested access for the most popular models, but some may be missing. \n
180
+ - Notably, loading data is not possible for some meta-llama and gemma models.
181
  - **Metrics**: CAPA (probabilistic), CAPA (deterministic), Error Consistency""")
182
 
183
  if __name__ == "__main__":