Single Best Answer question classifier

This model classifies multiple choice questions as Single-Best-Answer MCQs or not. SBA are MCQs where the question requires selecting the best or most correct answer among the set of choices. This is opposed to Single Correct Answer (SCA, where the distractors are all incorrect).

In order to train the example classifier to detect SBA examples, we selected four representative benchmarks (MMLU, MMLU-Pro, TruthfulQA and Commonsense-QA), sampled 4000 examples, and split them into evaluation (25%) and train (75%). We used GPT-4o-mini to automatically label the examples as SBA or SCA, and further annotated the evaluation split manually.

The model aligns with manual annotation (recall: 0.97, precision:0.93)

Model is part of our framework WiCkeD: A Simple Method to Make Multiple Choice Benchmarks More Challenging.

Citation

@misc{elhady2025wickedsimplemethodmake,
      title={WiCkeD: A Simple Method to Make Multiple Choice Benchmarks More Challenging}, 
      author={Ahmed Elhady and Eneko Agirre and Mikel Artetxe},
      year={2025},
      eprint={2502.18316},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.18316}, 
}
Downloads last month
388
Safetensors
Model size
109M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for ahmedselhady/bert-base-uncased-sba-clf

Finetuned
(3298)
this model