kiddothe2b's picture
Update README.md
327d7b5 verified
metadata
language: en
datasets:
  - squad_v2
model-index:
  - name: kiddothe2b/ModernBERT-base-squad2
    results:
      - task:
          type: question-answering
          name: Question Answering
        dataset:
          name: squad_v2
          type: squad_v2
          config: squad_v2
          split: validation
        metrics:
          - type: exact_match
            value: 81.2936
            name: Exact Match
          - type: f1
            value: 84.4849
            name: F1
base_model:
  - answerdotai/ModernBERT-base
pipeline_tag: question-answering
library_name: transformers

ModernBERT-base for Extractive QA

This is a single-model solution for SQuAD-like QA based on ModernBERT (Warner et al., 2024). ModernBERT is an up-to-date drop-in replacement for BERT-like Language Models. It is an Encoder-only, Pre-Norm Transformer with GeGLU activations pre-trained with Masked Language Modeling (MLM) on sequences of up to 1,024 tokens on a corpus of 2 trillion tokens of English text and code. ModernBERT adopted many recent best practices, i.e., increased masked rating, pre-normalization, no bias terms, etc, and it also seems to have the best performance in NLU tasks among base-sized encoder-only models, like BERT, RoBERTa, DeBERTa, etc. The available implementation of ModernBERT also utilizes Flash Attention, which makes it substantially faster compared to the outdated implementations of the rest, e.g., ModernBERT-base seems to run 3-4x faster compared to DeBERTa-V3-base.