Use `huggingface_hub.InferenceClient` instead of `openai` to call Sambanova
This is a suggestion to use the huggingface_hub
client instead of openai's one to call Sambanova API. No need to provide the sambanova API endpoint anymore. Also, one can use the HF model id and easily switch between providers (currently 7 providers on https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct, user can select based on speed/cost trade-off).
More advanced suggestion would be to set a HF_TOKEN
Space secret instead and instantiate the client like this:
client = huggingface_hub.InferenceClient(
provider="sambanova",
)
This will provide HF routing => easier to switch between providers while keeping billing in a single place.
(same comment on https://huggingface.co/spaces/fastrtc/talk-to-sambanova, do you want me to open a PR @freddyaboulton ?)
I tested this locally and I'm getting this error:
fastrtc.utils.WebRTCError: Model meta-llama/Llama-3.2-3B-Instruct is not supported with Sambanova for task conversational.
Full code:
from fastrtc import (ReplyOnPause, Stream, get_stt_model, get_tts_model)
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="sambanova",
)
stt_model = get_stt_model()
tts_model = get_tts_model()
def echo(audio):
prompt = stt_model.stt(audio)
response = client.chat.completions.create(
model="meta-llama/Llama-3.2-3B-Instruct",
messages=[{"role": "user", "content": prompt}],
max_tokens=400,
)
prompt = response.choices[0].message.content
for audio_chunk in tts_model.stream_tts_sync(prompt):
yield audio_chunk
stream = Stream(
ReplyOnPause(echo),
modality="audio",
mode="send-receive",
ui_args={
"icon": "https://upload.wikimedia.org/wikipedia/commons/thumb/0/01/Portrait-of-a-woman.jpg/960px-Portrait-of-a-woman.jpg?20200608215745",
"pulse_color": "rgb(35, 157, 225)",
},
)
stream.ui.launch()
I originally saw a bit more latency with going through HF first which is why I originally did not use Inference Client. I will test again and if it's negligible overhead I will switch over!
Also please move the PR over to the repo (https://github.com/freddyaboulton/fastrtc/tree/main/demo/talk_to_sambanova) everything gets synched here from there
Opened PR on GH: https://github.com/freddyaboulton/fastrtc/pull/79
Fixed on GH code updated here thanks @Wauplin