onnx support
#9
by
michaelfeil
- opened
This adds onnx support for https://github.com/michaelfeil/infinity conversions are identical to https://huggingface.co/Xenova/bge-small-en-v1.5
This is read Xiao @Shitao
michaelfeil
changed pull request title from
Upload 2 files
to onnx support
Please consider the following testing script that I wrote for this PR. My advise for reproducability is to use file_name="onnx/model.onnx"
. The main benefit of onnx will be in the fast onnx execution on cpu with the quantized model.
from optimum.onnxruntime import ORTModelForFeatureExtraction # type: ignore
import torch
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-small-en-v1.5')
model = AutoModel.from_pretrained('BAAI/bge-small-en-v1.5', revision="refs/pr/9")
model_ort = ORTModelForFeatureExtraction.from_pretrained('BAAI/bge-small-en-v1.5', revision="refs/pr/9", file_name="onnx/model.onnx")
model.eval()
# Sentences we want sentence embeddings for
sentences = ["样例数据-1", "样例数据-2"]
# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)
# encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')
# Compute token embeddings
with torch.no_grad():
model_output = model(**encoded_input)
model_output_ort = model_ort(**encoded_input)
# testing
import numpy as np
np.testing.assert_allclose(
model_output.last_hidden_state.cpu().numpy(),
model_output_ort.last_hidden_state.cpu().numpy(),
rtol=1e-3, atol=1e-5)
This is ready @Shitao - sorry for tagging you on that many PR, might be more helpful to review them at once.
Shitao
changed pull request status to
merged