Interested in use for a RAG application (semantic search)

#3
by Cablecutter - opened

Hi, I am interested in using your model in a RAG pipeline. I have a swedish novel and I would like to be able to query it in order to retrieve relevant passages. I am currently using https://huggingface.co/intfloat/multilingual-e5-large-instruct and am getting ok results, but am looking for improvements.

Do you feel that this model is appropriate for the task? Can you suggest other models as well that might work well?

National Library of Sweden / KBLab org

There is a Scandinavian Embedding Benchmark that compares different embedding models. You can take a look at that benchmark to find other models. Though I don't think there are many to choose from that are open source and better. Filter for Swedish in the table:

https://kennethenevoldsen.github.io/scandinavian-embedding-benchmark/

The most relevant benchmark datasets to look at for your usecase would probably be SweFAQ, SwednRetrieval and perhaps also Massive Intent.

KBLab/sentence-bert-swedish-cased is pretty similar in quality to the model you are using on these datasets according to the benchmark. I would expect this model to work similarly to multilingual-e5-large-instruct. Another aspect to consider is that this model is also smaller and 4x faster in terms of inference (if that matters).

Sincerely appreciate your response, I wasn't aware of the scandinavia leaderboard. This is extremely helpful. I'll give it a shot and thank you for contributing this model to the community :)

I have tried this for similarity search for a Swedish data repository and it achieves better top_n results (higher relevancy) for the retrieval to the user query. However, I currently run it locally on my mac (which works like a charm), however when trying to deploy it via streamlit cloud, they keep turning it down as it consumes too much memory. Are you aware of any (preferably Sweden domiciled) api endpoints one could use instead? It would be too costly to set up a dedicated server (even the smallest offered by AWS/Google/Azure) for the purpose of my RAG-app?

Any ideas on how to solve this?

I was thinking of trying to find a .gguf version of the embedder and use python fast-api to setup inference via ollama, but that seems a bit tedious but might be my last option?

many thanks!!

/Paul

I made a GGUF version of the model, let me know if it works for you! 😊
PierreMesure/sentence-bert-swedish-cased-gguf

I also wrote a small tutorial on how to do it since I didn't find llama.cpp's documentation super clear.

Sign up or log in to comment