SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-MiniLM-L6-v2
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("danicafisher/dfisher-sentence-transformer-fine-tuned")
# Run inference
sentences = [
'What methods are suggested for recording and integrating structured feedback about content provenance from various stakeholders in the context of GAI systems?',
"39 \nMS-3.3-004 \nProvide input for training materials about the capabilities and limitations of GAI \nsystems related to digital content transparency for AI Actors, other \nprofessionals, and the public about the societal impacts of AI and the role of \ndiverse and inclusive content generation. \nHuman-AI Configuration; \nInformation Integrity; Harmful Bias \nand Homogenization \nMS-3.3-005 \nRecord and integrate structured feedback about content provenance from \noperators, users, and potentially impacted communities through the use of \nmethods such as user research studies, focus groups, or community forums. \nActively seek feedback on generated content quality and potential biases. \nAssess the general awareness among end users and impacted communities \nabout the availability of these feedback channels. \nHuman-AI Configuration; \nInformation Integrity; Harmful Bias \nand Homogenization \nAI Actor Tasks: AI Deployment, Affected Individuals and Communities, End-Users, Operation and Monitoring, TEVV \n \nMEASURE 4.2: Measurement results regarding AI system trustworthiness in deployment context(s) and across the AI lifecycle are \ninformed by input from domain experts and relevant AI Actors to validate whether the system is performing consistently as \nintended. Results are documented. \nAction ID \nSuggested Action \nGAI Risks \nMS-4.2-001 \nConduct adversarial testing at a regular cadence to map and measure GAI risks, \nincluding tests to address attempts to deceive or manipulate the application of \nprovenance techniques or other misuses. Identify vulnerabilities and \nunderstand potential misuse scenarios and unintended outputs. \nInformation Integrity; Information \nSecurity \nMS-4.2-002 \nEvaluate GAI system performance in real-world scenarios to observe its \nbehavior in practical environments and reveal issues that might not surface in \ncontrolled and optimized testing environments. \nHuman-AI Configuration; \nConfabulation; Information \nSecurity \nMS-4.2-003 \nImplement interpretability and explainability methods to evaluate GAI system \ndecisions and verify alignment with intended purpose. \nInformation Integrity; Harmful Bias \nand Homogenization \nMS-4.2-004 \nMonitor and document instances where human operators or other systems \noverride the GAI's decisions. Evaluate these cases to understand if the overrides \nare linked to issues related to content provenance. \nInformation Integrity \nMS-4.2-005 \nVerify and document the incorporation of results of structured public feedback \nexercises into design, implementation, deployment approval (“go”/“no-go” \ndecisions), monitoring, and decommission decisions. \nHuman-AI Configuration; \nInformation Security \nAI Actor Tasks: AI Deployment, Domain Experts, End-Users, Operation and Monitoring, TEVV",
'46 \nMG-4.3-003 \nReport GAI incidents in compliance with legal and regulatory requirements (e.g., \nHIPAA breach reporting, e.g., OCR (2023) or NHTSA (2022) autonomous vehicle \ncrash reporting requirements. \nInformation Security; Data Privacy \nAI Actor Tasks: AI Deployment, Affected Individuals and Communities, Domain Experts, End-Users, Human Factors, Operation and \nMonitoring',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 274 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 274 samples:
sentence_0 sentence_1 type string string details - min: 12 tokens
- mean: 22.67 tokens
- max: 38 tokens
- min: 21 tokens
- mean: 245.27 tokens
- max: 256 tokens
- Samples:
sentence_0 sentence_1 How does the Executive Order on Advancing Racial Equity define 'equity' and 'underserved communities'?
ENDNOTES
47. Darshali A. Vyas et al., Hidden in Plain Sight – Reconsidering the Use of Race Correction in Clinical
Algorithms, 383 N. Engl. J. Med.874, 876-78 (Aug. 27, 2020), https://www.nejm.org/doi/full/10.1056/
NEJMms2004740.
48. The definitions of 'equity' and 'underserved communities' can be found in the Definitions section of
this framework as well as in Section 2 of The Executive Order On Advancing Racial Equity and Support
for Underserved Communities Through the Federal Government. https://www.whitehouse.gov/
briefing-room/presidential-actions/2021/01/20/executive-order-advancing-racial-equity-and-support
for-underserved-communities-through-the-federal-government/
49. Id.
50. Various organizations have offered proposals for how such assessments might be designed. See, e.g.,
Emanuel Moss, Elizabeth Anne Watkins, Ranjit Singh, Madeleine Clare Elish, and Jacob Metcalf.
Assembling Accountability: Algorithmic Impact Assessment for the Public Interest. Data & Society
Research Institute Report. June 29, 2021. https://datasociety.net/library/assembling-accountability
algorithmic-impact-assessment-for-the-public-interest/; Nicol Turner Lee, Paul Resnick, and Genie
Barton. Algorithmic bias detection and mitigation: Best practices and policies to reduce consumer harms.
Brookings Report. May 22, 2019.
https://www.brookings.edu/research/algorithmic-bias-detection-and-mitigation-best-practices-and
policies-to-reduce-consumer-harms/; Andrew D. Selbst. An Institutional View Of Algorithmic Impact
Assessments. Harvard Journal of Law & Technology. June 15, 2021. https://ssrn.com/abstract=3867634;
Dillon Reisman, Jason Schultz, Kate Crawford, and Meredith Whittaker. Algorithmic Impact
Assessments: A Practical Framework for Public Agency Accountability. AI Now Institute Report. April
2018. https://ainowinstitute.org/aiareport2018.pdf
51. Department of Justice. Justice Department Announces New Initiative to Combat Redlining. Oct. 22,
2021. https://www.justice.gov/opa/pr/justice-department-announces-new-initiative-combat-redlining
52. PAVE Interagency Task Force on Property Appraisal and Valuation Equity. Action Plan to Advance
Property Appraisal and Valuation Equity: Closing the Racial Wealth Gap by Addressing Mis-valuations for
Families and Communities of Color. March 2022. https://pave.hud.gov/sites/pave.hud.gov/files/
documents/PAVEActionPlan.pdf
53. U.S. Equal Employment Opportunity Commission. The Americans with Disabilities Act and the Use of
Software, Algorithms, and Artificial Intelligence to Assess Job Applicants and Employees. EEOC
NVTA-2022-2. May 12, 2022. https://www.eeoc.gov/laws/guidance/americans-disabilities-act-and-use
software-algorithms-and-artificial-intelligence; U.S. Department of Justice. Algorithms, Artificial
Intelligence, and Disability Discrimination in Hiring. May 12, 2022. https://beta.ada.gov/resources/ai
guidance/
54. Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. Dissecting racial bias in
an algorithm used to manage the health of populations. Science. Vol. 366, No. 6464. Oct. 25, 2019. https://
www.science.org/doi/10.1126/science.aax2342
55. Data & Trust Alliance. Algorithmic Bias Safeguards for Workforce: Overview. Jan. 2022. https://
dataandtrustalliance.org/Algorithmic_Bias_Safeguards_for_Workforce_Overview.pdf
56. Section 508.gov. IT Accessibility Laws and Policies. Access Board. https://www.section508.gov/
manage/laws-and-policies/
67What are the key expectations for automated systems as outlined in the context?
HUMAN ALTERNATIVES,
CONSIDERATION, AND
FALLBACK
WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS
The expectations for automated systems are meant to serve as a blueprint for the development of additional
technical standards and practices that are tailored for particular sectors and contexts.
Equitable. Consideration should be given to ensuring outcomes of the fallback and escalation system are
equitable when compared to those of the automated system and such that the fallback and escalation
system provides equitable access to underserved communities.105
Timely. Human consideration and fallback are only useful if they are conducted and concluded in a
timely manner. The determination of what is timely should be made relative to the specific automated
system, and the review system should be staffed and regularly assessed to ensure it is providing timely
consideration and fallback. In time-critical systems, this mechanism should be immediately available or,
where possible, available before the harm occurs. Time-critical systems include, but are not limited to,
voting-related systems, automated building access and other access systems, systems that form a critical
component of healthcare, and systems that have the ability to withhold wages or otherwise cause
immediate financial penalties.
Effective. The organizational structure surrounding processes for consideration and fallback should
be designed so that if the human decision-maker charged with reassessing a decision determines that it
should be overruled, the new decision will be effectively enacted. This includes ensuring that the new
decision is entered into the automated system throughout its components, any previous repercussions from
the old decision are also overturned, and safeguards are put in place to help ensure that future decisions do
not result in the same errors.
Maintained. The human consideration and fallback process and any associated automated processes
should be maintained and supported as long as the relevant automated system continues to be in use.
Institute training, assessment, and oversight to combat automation bias and ensure any
human-based components of a system are effective.
Training and assessment. Anyone administering, interacting with, or interpreting the outputs of an auto
mated system should receive training in that system, including how to properly interpret outputs of a system
in light of its intended purpose and in how to mitigate the effects of automation bias. The training should reoc
cur regularly to ensure it is up to date with the system and to ensure the system is used appropriately. Assess
ment should be ongoing to ensure that the use of the system with human involvement provides for appropri
ate results, i.e., that the involvement of people does not invalidate the system's assessment as safe and effective
or lead to algorithmic discrimination.
Oversight. Human-based systems have the potential for bias, including automation bias, as well as other
concerns that may limit their effectiveness. The results of assessments of the efficacy and potential bias of
such human-based systems should be overseen by governance structures that have the potential to update the
operation of the human-based system in order to mitigate these effects.
50What is the focus of the report titled "Assembling Accountability: Algorithmic Impact Assessment for the Public Interest" by Emanuel Moss and others?
ENDNOTES
47. Darshali A. Vyas et al., Hidden in Plain Sight – Reconsidering the Use of Race Correction in Clinical
Algorithms, 383 N. Engl. J. Med.874, 876-78 (Aug. 27, 2020), https://www.nejm.org/doi/full/10.1056/
NEJMms2004740.
48. The definitions of 'equity' and 'underserved communities' can be found in the Definitions section of
this framework as well as in Section 2 of The Executive Order On Advancing Racial Equity and Support
for Underserved Communities Through the Federal Government. https://www.whitehouse.gov/
briefing-room/presidential-actions/2021/01/20/executive-order-advancing-racial-equity-and-support
for-underserved-communities-through-the-federal-government/
49. Id.
50. Various organizations have offered proposals for how such assessments might be designed. See, e.g.,
Emanuel Moss, Elizabeth Anne Watkins, Ranjit Singh, Madeleine Clare Elish, and Jacob Metcalf.
Assembling Accountability: Algorithmic Impact Assessment for the Public Interest. Data & Society
Research Institute Report. June 29, 2021. https://datasociety.net/library/assembling-accountability
algorithmic-impact-assessment-for-the-public-interest/; Nicol Turner Lee, Paul Resnick, and Genie
Barton. Algorithmic bias detection and mitigation: Best practices and policies to reduce consumer harms.
Brookings Report. May 22, 2019.
https://www.brookings.edu/research/algorithmic-bias-detection-and-mitigation-best-practices-and
policies-to-reduce-consumer-harms/; Andrew D. Selbst. An Institutional View Of Algorithmic Impact
Assessments. Harvard Journal of Law & Technology. June 15, 2021. https://ssrn.com/abstract=3867634;
Dillon Reisman, Jason Schultz, Kate Crawford, and Meredith Whittaker. Algorithmic Impact
Assessments: A Practical Framework for Public Agency Accountability. AI Now Institute Report. April
2018. https://ainowinstitute.org/aiareport2018.pdf
51. Department of Justice. Justice Department Announces New Initiative to Combat Redlining. Oct. 22,
2021. https://www.justice.gov/opa/pr/justice-department-announces-new-initiative-combat-redlining
52. PAVE Interagency Task Force on Property Appraisal and Valuation Equity. Action Plan to Advance
Property Appraisal and Valuation Equity: Closing the Racial Wealth Gap by Addressing Mis-valuations for
Families and Communities of Color. March 2022. https://pave.hud.gov/sites/pave.hud.gov/files/
documents/PAVEActionPlan.pdf
53. U.S. Equal Employment Opportunity Commission. The Americans with Disabilities Act and the Use of
Software, Algorithms, and Artificial Intelligence to Assess Job Applicants and Employees. EEOC
NVTA-2022-2. May 12, 2022. https://www.eeoc.gov/laws/guidance/americans-disabilities-act-and-use
software-algorithms-and-artificial-intelligence; U.S. Department of Justice. Algorithms, Artificial
Intelligence, and Disability Discrimination in Hiring. May 12, 2022. https://beta.ada.gov/resources/ai
guidance/
54. Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. Dissecting racial bias in
an algorithm used to manage the health of populations. Science. Vol. 366, No. 6464. Oct. 25, 2019. https://
www.science.org/doi/10.1126/science.aax2342
55. Data & Trust Alliance. Algorithmic Bias Safeguards for Workforce: Overview. Jan. 2022. https://
dataandtrustalliance.org/Algorithmic_Bias_Safeguards_for_Workforce_Overview.pdf
56. Section 508.gov. IT Accessibility Laws and Policies. Access Board. https://www.section508.gov/
manage/laws-and-policies/
67 - Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 16per_device_eval_batch_size
: 16multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseeval_use_gather_object
: Falsebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Framework Versions
- Python: 3.11.9
- Sentence Transformers: 3.1.1
- Transformers: 4.44.2
- PyTorch: 2.4.1
- Accelerate: 0.34.2
- Datasets: 3.0.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 46
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for danicafisher/dfisher-sentence-transformer-fine-tuned
Base model
sentence-transformers/all-MiniLM-L6-v2