language: en
datasets:
- squad_v2
model-index:
- name: kiddothe2b/ModernBERT-base-squad2
results:
- task:
type: question-answering
name: Question Answering
dataset:
name: squad_v2
type: squad_v2
config: squad_v2
split: validation
metrics:
- type: exact_match
value: 81.2936
name: Exact Match
- type: f1
value: 84.4849
name: F1
base_model:
- answerdotai/ModernBERT-base
pipeline_tag: question-answering
library_name: transformers
ModernBERT-base for Extractive QA
This is a single-model solution for SQuAD-like QA based on ModernBERT (Warner et al., 2024). ModernBERT is an up-to-date drop-in replacement for BERT-like Language Models. It is an Encoder-only, Pre-Norm Transformer with GeGLU activations pre-trained with Masked Language Modeling (MLM) on sequences of up to 1,024 tokens on a corpus of 2 trillion tokens of English text and code. ModernBERT adopted many recent best practices, i.e., increased masked rating, pre-normalization, no bias terms, etc, and it also seems to have the best performance in NLU tasks among base-sized encoder-only models, like BERT, RoBERTa, DeBERTa, etc. The available implementation of ModernBERT also utilizes Flash Attention, which makes it substantially faster compared to the outdated implementations of the rest, e.g., ModernBERT-base seems to run 3-4x faster compared to DeBERTa-V3-base.