ymoslem
/

ModernBERT-base-long-context-qe-v1

@@ -37,32 +37,54 @@ datasets:
 - ymoslem/wmt-da-human-evaluation-long-context
 model-index:
 - name: Quality Estimation for Machine Translation
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # Quality Estimation for Machine Translation
 This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the ymoslem/wmt-da-human-evaluation-long-context dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.0214
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
@@ -72,7 +94,7 @@ The following hyperparameters were used during training:
 - seed: 42
 - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
-- training_steps: 60000
 ### Training results
@@ -146,3 +168,126 @@ The following hyperparameters were used during training:
 - Pytorch 2.4.1+cu124
 - Datasets 3.2.0
 - Tokenizers 0.21.0

 - ymoslem/wmt-da-human-evaluation-long-context
 model-index:
 - name: Quality Estimation for Machine Translation
+  results:
+  - task:
+      type: regression
+    dataset:
+      name: ymoslem/wmt-da-human-evaluation-long-context
+      type: QE
+    metrics:
+    - name: Pearson Correlation
+      type: Pearson
+      value: 0.5013
+    - name: Mean Absolute Error
+      type: MAE
+      value: 0.1024
+    - name: Root Mean Squared Error
+      type: RMSE
+      value: 0.1464
+    - name: R-Squared
+      type: R2
+      value: 0.251
+metrics:
+- pearsonr
+- mae
+- r_squared
 ---
 # Quality Estimation for Machine Translation
 This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the ymoslem/wmt-da-human-evaluation-long-context dataset.
 It achieves the following results on the evaluation set:
+- Last checkpoint: Loss: 0.0214
+- Best checkpoint (this one): Loss: 0.0214
 ## Model description
+This model is for reference-free quality estimation (QE) of machine translation (MT) systems.
 ## Training and evaluation data
+The model is trained on the long-context dataset [ymoslem/wmt-da-human-evaluation-long-context](https://huggingface.co/datasets/ymoslem/wmt-da-human-evaluation-long-context).
+* Training: 7.65 million long-context texts
+* Test: 59,235 long-context texts
 ## Training procedure
+- tokenizer.model_max_length: 8192 (full context length)
+- attn_implementation: flash_attention_2
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - seed: 42
 - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
+- training_steps: 60000 (approx. 1 epoch)
 ### Training results
 - Pytorch 2.4.1+cu124
 - Datasets 3.2.0
 - Tokenizers 0.21.0
+## Inference
+1. Install the required libraries.
+```bash
+pip3 install --upgrade datasets accelerate transformers
+pip3 install --upgrade flash_attn triton
+```
+2. Load the test dataset.
+```python
+from datasets import load_dataset
+test_dataset = load_dataset("ymoslem/wmt-da-human-evaluation",
+                             split="test",
+                             trust_remote_code=True
+                            )
+print(test_dataset)
+```
+3. Load the model and tokenizer:
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch
+# Load the fine-tuned model and tokenizer
+model_name = "ymoslem/ModernBERT-base-long-context-qe-v1"
+model = AutoModelForSequenceClassification.from_pretrained(
+    model_name,
+    device_map="auto",
+    torch_dtype=torch.bfloat16,
+    attn_implementation="flash_attention_2",
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+# Move model to GPU if available
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model.to(device)
+model.eval()
+```
+4. Prepare the dataset. Each source segment `src` and target segment `tgt` are separated by the `sep_token`, which is `'</s>'` for ModernBERT.
+```python
+sep_token = tokenizer.sep_token
+input_test_texts = [f"{src} {sep_token} {tgt}" for src, tgt in zip(test_dataset["src"], test_dataset["mt"])]
+```
+5. Generate predictions.
+If you print `model.config.problem_type`, the output is `regression`.
+Still, you can use the "text-classification" pipeline as follows (cf. [pipeline documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TextClassificationPipeline)):
+```python
+from transformers import pipeline
+classifier = pipeline("text-classification",
+                      model=model_name,
+                      tokenizer=tokenizer,
+                      device=0,
+                     )
+predictions = classifier(input_test_texts,
+                         batch_size=128,
+                         truncation=True,
+                         padding="max_length",
+                         max_length=tokenizer.model_max_length,
+                       )
+predictions = [prediction["score"] for prediction in predictions]
+```
+Alternatively, you can use an elaborate version of the code, which is slightly faster and provides more control.
+```python
+from torch.utils.data import DataLoader
+import torch
+from tqdm.auto import tqdm
+# Tokenization function
+def process_batch(batch, tokenizer, device):
+    sep_token = tokenizer.sep_token
+    input_texts = [f"{src} {sep_token} {tgt}" for src, tgt in zip(batch["src"], batch["mt"])]
+    tokens = tokenizer(input_texts,
+                       truncation=True,
+                       padding="max_length",
+                       max_length=tokenizer.model_max_length,
+                       return_tensors="pt",
+                      ).to(device)
+    return tokens
+# Create a DataLoader for batching
+test_dataloader = DataLoader(test_dataset,
+                             batch_size=128,   # Adjust batch size as needed
+                             shuffle=False)
+# List to store all predictions
+predictions = []
+with torch.no_grad():
+    for batch in tqdm(test_dataloader, desc="Inference Progress", unit="batch"):
+        tokens = process_batch(batch, tokenizer, device)
+        # Forward pass: Generate model's logits
+        outputs = model(**tokens)
+        # Get logits (predictions)
+        logits = outputs.logits
+        # Extract the regression predicted values
+        batch_predictions = logits.squeeze()
+        # Extend the list with the predictions
+        predictions.extend(batch_predictions.tolist())
+```