Single Best Answer question classifier

This model classifies multiple choice questions as Single-Best-Answer MCQs or not. SBA are MCQs where the question requires selecting the best or most correct answer among the set of choices. This is opposed to Single Correct Answer (SCA, where the distractors are all incorrect).

In order to train the example classifier to detect SBA examples, we selected four representative benchmarks (MMLU, MMLU-Pro, TruthfulQA and Commonsense-QA), sampled 4000 examples, and split them into evaluation (25%) and train (75%). We used GPT-4o-mini to automatically label the examples as SBA or SCA, and further annotated the evaluation split manually.

The model aligns with manual annotation (recall: 0.97, precision:0.93)

Model is part of our framework WiCkeD: A Simple Method to Make Multiple Choice Benchmarks More Challenging.

💻 Github Repository
📖 Arxiv Preprint

Citation

@misc{elhady2025wickedsimplemethodmake,
      title={WiCkeD: A Simple Method to Make Multiple Choice Benchmarks More Challenging}, 
      author={Ahmed Elhady and Eneko Agirre and Mikel Artetxe},
      year={2025},
      eprint={2502.18316},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.18316}, 
}

ahmedselhady
/

bert-base-uncased-sba-clf

Single Best Answer question classifier

Citation

Model tree for ahmedselhady/bert-base-uncased-sba-clf