ContinuousAT
/

Zephyr-CAT

Model card Files Files and versions Community

Model Card for Model ID

In this repo are LoRa weights of the zephyr-7b-beta model (https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) finetuned with the Continuous Adversarial Training (CAT) algorithm. For more information, see our paper "Efficient Adversarial Training in LLMs with Continuous Attacks" (https://arxiv.org/abs/2405.15589)

Github

https://github.com/sophie-xhonneux/Continuous-AdvTrain/edit/master/README.md

Citation

If you used this model, please cite our paper:

@misc{xhonneux2024efficient,
      title={Efficient Adversarial Training in LLMs with Continuous Attacks}, 
      author={Sophie Xhonneux and Alessandro Sordoni and Stephan Günnemann and Gauthier Gidel and Leo Schwinn},
      year={2024},
      eprint={2405.15589},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Downloads last month: 2,358

Inference Providers NEW

This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model’s pipeline type.

Model tree for ContinuousAT/Zephyr-CAT

Base model

mistralai/Mistral-7B-v0.1

Finetuned

HuggingFaceH4/zephyr-7b-beta

Adapter

(380)

this model