MPNet base trained on AllNLI-turkish triplets

This is a sentence-transformers model finetuned from microsoft/mpnet-base on the all-nli-triplets-turkish dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/mpnet-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("mertcobanov/mpnet-base-all-nli-triplet-turkish-v2")
# Run inference
sentences = [
    'Böyle şeyler görmek ve eğer yapabileceğiniz en küçük bir şey varsa, bu yardımcı olur.',
    'Böyle bir şeyi gözlemlemek ve yapıp yapamayacağınızı bilmek için.',
    'Böyle bir şeyi görmek kötü, eğer yapabiliyorsanız buna hiç katkıda bulunmayın.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

  • Datasets: all-nli-dev-turkish and all-nli-test-turkish
  • Evaluated with TripletEvaluator
Metric all-nli-dev-turkish all-nli-test-turkish
cosine_accuracy 0.7454 0.7525

Training Details

Training Dataset

all-nli-triplets-turkish

  • Dataset: all-nli-triplets-turkish at bff203b
  • Size: 13,842 training samples
  • Columns: anchor_translated, positive_translated, and negative_translated
  • Approximate statistics based on the first 1000 samples:
    anchor_translated positive_translated negative_translated
    type string string string
    details
    • min: 8 tokens
    • mean: 13.42 tokens
    • max: 95 tokens
    • min: 8 tokens
    • mean: 31.64 tokens
    • max: 93 tokens
    • min: 6 tokens
    • mean: 32.03 tokens
    • max: 89 tokens
  • Samples:
    anchor_translated positive_translated negative_translated
    Asyalı okul çocukları birbirlerinin omuzlarında oturuyor. Okul çocukları bir arada Asyalı fabrika işçileri oturuyor.
    İnsanlar dışarıda. Arka planda resmi kıyafetler giymiş bir grup insan var ve beyaz gömlekli, haki pantolonlu bir adam toprak yoldan yeşil çimenlere atlıyor. Bir odada üç kişiyle birlikte büyük bir kamera tutan bir adam.
    Bir adam dışarıda. Adam yarış sırasında yan sepetten bir su birikintisine düşer. Beyaz bir sarık sarmış gömleksiz bir adam bir ağaç gövdesine tırmanıyor.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

all-nli-triplets-turkish

  • Dataset: all-nli-triplets-turkish at bff203b
  • Size: 6,584 evaluation samples
  • Columns: anchor_translated, positive_translated, and negative_translated
  • Approximate statistics based on the first 1000 samples:
    anchor_translated positive_translated negative_translated
    type string string string
    details
    • min: 5 tokens
    • mean: 42.62 tokens
    • max: 192 tokens
    • min: 5 tokens
    • mean: 22.58 tokens
    • max: 77 tokens
    • min: 5 tokens
    • mean: 22.07 tokens
    • max: 65 tokens
  • Samples:
    anchor_translated positive_translated negative_translated
    Ayrıca, bu özel tüketim vergileri, diğer vergiler gibi, hükümetin ödeme zorunluluğunu sağlama yetkisini kullanarak belirlenir. Hükümetin ödeme zorlaması, özel tüketim vergilerinin nasıl hesaplandığını belirler. Özel tüketim vergileri genel kuralın bir istisnasıdır ve aslında GSYİH payına dayalı olarak belirlenir.
    Gri bir sweatshirt giymiş bir sanatçı, canlı renklerde bir kasaba tablosu üzerinde çalışıyor. Bir ressam gri giysiler içinde bir kasabanın resmini yapıyor. Bir kişi bir beyzbol sopası tutuyor ve gelen bir atış için planda bekliyor.
    İmkansız. Yapılamaz. Tamamen mümkün.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss all-nli-dev-turkish_cosine_accuracy all-nli-test-turkish_cosine_accuracy
0 0 - - 0.6092 -
0.1155 100 2.7414 1.6615 0.7429 -
0.2309 200 1.64 1.4650 0.7483 -
0.3464 300 1.2391 1.4068 0.7561 -
0.4619 400 1.1146 1.4367 0.7549 -
0.5774 500 1.0341 1.4887 0.7486 -
0.6928 600 0.7568 1.4568 0.7535 -
0.8083 700 0.7216 1.5680 0.7451 -
0.9238 800 0.5919 1.5492 0.7454 -
1.0 866 - - - 0.7525

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.3.1
  • Transformers: 4.46.3
  • PyTorch: 2.3.0
  • Accelerate: 1.1.1
  • Datasets: 3.1.0
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
4
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for mertcobanov/mpnet-base-all-nli-triplet-turkish-v2

Finetuned
(49)
this model

Dataset used to train mertcobanov/mpnet-base-all-nli-triplet-turkish-v2

Evaluation results