ModernBERT-base for Extractive QA

This is a single-model solution for SQuAD-like QA based on ModernBERT (Warner et al., 2024). ModernBERT is an up-to-date drop-in replacement for BERT-like Language Models. It is an Encoder-only, Pre-Norm Transformer with GeGLU activations pre-trained with Masked Language Modeling (MLM) on sequences of up to 1,024 tokens on a corpus of 2 trillion tokens of English text and code. ModernBERT adopted many recent best practices, i.e., increased masked rating, pre-normalization, no bias terms, etc, and it also seems to have the best performance in NLU tasks among base-sized encoder-only models, like BERT, RoBERTa, DeBERTa, etc. The available implementation of ModernBERT also utilizes Flash Attention, which makes it substantially faster compared to the outdated implementations of the rest, e.g., ModernBERT-base seems to run 3-4x faster compared to DeBERTa-V3-base.

Downloads last month
35
Safetensors
Model size
149M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for kiddothe2b/ModernBERT-base-squad2

Finetuned
(369)
this model

Dataset used to train kiddothe2b/ModernBERT-base-squad2

Evaluation results