sgpt-bloom-1b7-nli

Usage

For usage instructions, refer to: https://github.com/Muennighoff/sgpt#symmetric-semantic-search

The model was trained with the command

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch examples/training/nli/training_nli_v2.py --model_name bigscience/bloom-1b3 --freezenonbias --train_batch_size 128 --lr 32e-5 --pooling weightedmean --wandb --wandbwatchlog gradients --gradcache --chunksize 4

Evaluation Results

{'askubuntu': 57.44, 'cqadupstack': 14.18, 'twitterpara': 73.99, 'scidocs': 74.74, 'avg': 55.087500000000006}

Training

The model was trained with the parameters:

DataLoader:

sentence_transformers.datasets.NoDuplicatesDataLoader.NoDuplicatesDataLoader of length 4403 with parameters:

{'batch_size': 128}

The model uses BitFit, weighted-mean pooling & GradCache, for details see: https://arxiv.org/abs/2202.08904

Loss:

sentence_transformers.losses.MultipleNegativesRankingLoss.MNRLGradCache

Parameters of the fit()-Method:

{
    "epochs": 1,
    "evaluation_steps": 440,
    "evaluator": "sentence_transformers.evaluation.EmbeddingSimilarityEvaluator.EmbeddingSimilarityEvaluator",
    "max_grad_norm": 1,
    "optimizer_class": "<class 'transformers.optimization.AdamW'>",
    "optimizer_params": {
        "lr": 0.00032
    },
    "scheduler": "WarmupLinear",
    "steps_per_epoch": null,
    "warmup_steps": 441,
    "weight_decay": 0.01
}

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 75, 'do_lower_case': False}) with Transformer model: BloomModel 
  (1): Pooling({'word_embedding_dimension': 2048, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': True, 'pooling_mode_lasttoken': False})
)

Citing & Authors

@article{muennighoff2022sgpt,
  title={SGPT: GPT Sentence Embeddings for Semantic Search},
  author={Muennighoff, Niklas},
  journal={arXiv preprint arXiv:2202.08904},
  year={2022}
}
Downloads last month
473
Safetensors
Model size
1.72B params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using bigscience-data/sgpt-bloom-1b7-nli 3

Evaluation results