metadata

license: mit

Model Card for FIRST

FIRST is a language models trained specifically for reranking tasks, leveraging the output logits of the first generated identifier to directly produce a ranked ordering of candidates. Built on the Zephyr-7B-β model, FIRST undergoes single-stage fine-tuning on the converted alphabetic RankZephyr dataset, which includes RankGPT-4 reorderings of OpenAI's Ada2 outputs for 5k queries.

Model Description

Model type: A 7B parameter GPT-like model based on Zephyr-7B-β model and further fine-tuned on task-specific listwise reranking data
Language(s) (NLP): Primarily English
License: MIT
Finetuned from model [optional]: HuggingFaceH4/zephyr-7b-beta

Model Sources [optional]

Repository: https://github.com/gangiswag/llm-reranker
Paper [optional]: https://arxiv.org/abs/2406.15657

Evaluations

At the time of release, FIRST demonstrates superior performance across a variety of datasets. The table below provides a detailed performance comparison against other LLM rerankers on the BEIR benchmark.

Reranker	Training Data	Avg.	Climate FEVER	DBPedia	FEVER	FiQA	Hotpot QA	MS Marco	NFCorpus	NQ	Sci-docs	Sci-fact	Trec-COVID
Rank Vicuna	GPT 3.5	50.7	28.2	50.0	81.0	35.9	73.5	36.7	33.1	58.6	18.4	70.5	71.3
Rank Zephyr	GPT 3.5 + 3.5	53.7	25.6	50.0	80.1	42.2	71.6	42.7	37.7	65.6	20.5	76.7	78.4
FIRST	GPT-4	54.3	26.7	50.9	81.7	42.2	74.2	44.4	37.4	66.4	20.4	74.6	78.8
More details can be found in the paper.

Bias, Risks, and Limitations

We forward here an excerpt from the Zephyr-7B-β model card:

"Zephyr-7B-β has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). It is also unknown what the size and composition of the corpus was used to train the base model ([mistralai/Mistral-7B-v0.1]), however it is likely to have included a mix of Web data and technical sources like books and code. See the Falcon 180B model card for an example of this."

FIRST is trained specifically on monolingual English data, effectiveness on multilingual sets is not guaranteed.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

Citation [optional]

If you find FIRST useful for your work, please consider citing our paper:

@article{reddy2024first,
  title={FIRST: Faster Improved Listwise Reranking with Single Token Decoding},
  author={Reddy, Revanth Gangi and Doo, JaeHyeok and Xu, Yifei and Sultan, Md Arafat and Swain, Deevya and Sil, Avirup and Ji, Heng},
  journal={arXiv preprint arXiv:2406.15657},
  year={2024}
}