SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("philipp-zettl/MiniLM-similarity-small")
# Run inference
sentences = [
    'Envoyez-moi la politique de garantie de ce produit',
    'faq query',
    'account query',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.6538
spearman_cosine 0.6337
pearson_manhattan 0.58
spearman_manhattan 0.5526
pearson_euclidean 0.5732
spearman_euclidean 0.5395
pearson_dot 0.636
spearman_dot 0.6238
pearson_max 0.6538
spearman_max 0.6337

Semantic Similarity

Metric Value
pearson_cosine 0.6682
spearman_cosine 0.6222
pearson_manhattan 0.5715
spearman_manhattan 0.5481
pearson_euclidean 0.5727
spearman_euclidean 0.5493
pearson_dot 0.6396
spearman_dot 0.6107
pearson_max 0.6682
spearman_max 0.6222

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,267 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 10.77 tokens
    • max: 18 tokens
    • min: 4 tokens
    • mean: 5.31 tokens
    • max: 6 tokens
    • min: 0.0
    • mean: 0.67
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    Get information on the next art exhibition product query 0.0
    Show me how to update my profile product query 0.0
    Покажите мне доступные варианты полетов в Турцию faq query 0.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 159 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 10.65 tokens
    • max: 17 tokens
    • min: 4 tokens
    • mean: 5.35 tokens
    • max: 6 tokens
    • min: 0.0
    • mean: 0.67
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    Sende mir die Bestellbestätigung per E-Mail order query 0.0
    How do I add a new payment method? faq query 1.0
    No puedo conectar mi impresora, ¿puedes ayudarme? support query 1.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • learning_rate: 2e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss MiniLM-dev_spearman_cosine MiniLM-test_spearman_cosine
0.0629 10 6.2479 2.5890 0.1448 -
0.1258 20 4.3549 2.2787 0.1965 -
0.1887 30 3.5969 2.0104 0.2599 -
0.2516 40 2.4979 1.7269 0.3357 -
0.3145 50 2.5551 1.5747 0.4439 -
0.3774 60 3.1446 1.4892 0.4750 -
0.4403 70 2.1353 1.5305 0.4662 -
0.5031 80 2.9341 1.3718 0.4848 -
0.5660 90 2.8709 1.2469 0.5316 -
0.6289 100 2.1367 1.2558 0.5436 -
0.6918 110 2.2735 1.2939 0.5392 -
0.7547 120 2.8646 1.1206 0.5616 -
0.8176 130 3.3204 1.0213 0.5662 -
0.8805 140 0.8989 0.9866 0.5738 -
0.9434 150 0.0057 0.9961 0.5674 -
1.0063 160 0.0019 1.0111 0.5674 -
1.0692 170 0.4617 1.0275 0.5747 -
1.1321 180 0.0083 1.0746 0.5732 -
1.1950 190 0.5048 1.0968 0.5753 -
1.2579 200 0.0002 1.0840 0.5738 -
1.3208 210 0.07 1.0364 0.5753 -
1.3836 220 0.0 0.9952 0.5750 -
1.4465 230 0.0 0.9922 0.5744 -
1.5094 240 0.0 0.9923 0.5726 -
1.0126 250 0.229 0.9930 0.5729 -
1.0755 260 2.2061 0.9435 0.5880 -
1.1384 270 2.7711 0.8892 0.6078 -
1.2013 280 0.7528 0.8886 0.6148 -
1.2642 290 0.386 0.8927 0.6162 -
1.3270 300 0.8902 0.8710 0.6267 -
1.3899 310 0.9534 0.8429 0.6337 -
1.4403 318 - - - 0.6222

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.33.0
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
4
Safetensors
Model size
118M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for philipp-zettl/MiniLM-similarity-small

Evaluation results