Use `huggingface_hub.InferenceClient` instead of `openai` to call Sambanova

#1
by Wauplin HF staff - opened

This is a suggestion to use the huggingface_hub client instead of openai's one to call Sambanova API. No need to provide the sambanova API endpoint anymore. Also, one can use the HF model id and easily switch between providers (currently 7 providers on https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct, user can select based on speed/cost trade-off).

More advanced suggestion would be to set a HF_TOKEN Space secret instead and instantiate the client like this:

client = huggingface_hub.InferenceClient(
    provider="sambanova",
)

This will provide HF routing => easier to switch between providers while keeping billing in a single place.

I tested this locally and I'm getting this error:

fastrtc.utils.WebRTCError: Model meta-llama/Llama-3.2-3B-Instruct is not supported with Sambanova for task conversational.

Full code:

from fastrtc import (ReplyOnPause, Stream, get_stt_model, get_tts_model)
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="sambanova",
)

stt_model = get_stt_model()
tts_model = get_tts_model()

def echo(audio):
    prompt = stt_model.stt(audio)
    response = client.chat.completions.create(
        model="meta-llama/Llama-3.2-3B-Instruct",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=400,
    )
    prompt = response.choices[0].message.content
    for audio_chunk in tts_model.stream_tts_sync(prompt):
        yield audio_chunk

stream = Stream(
    ReplyOnPause(echo), 
    modality="audio", 
    mode="send-receive", 
    ui_args={
        "icon": "https://upload.wikimedia.org/wikipedia/commons/thumb/0/01/Portrait-of-a-woman.jpg/960px-Portrait-of-a-woman.jpg?20200608215745",
        "pulse_color": "rgb(35, 157, 225)",
    },
)

stream.ui.launch()

I originally saw a bit more latency with going through HF first which is why I originally did not use Inference Client. I will test again and if it's negligible overhead I will switch over!

Also please move the PR over to the repo (https://github.com/freddyaboulton/fastrtc/tree/main/demo/talk_to_sambanova) everything gets synched here from there

@abidlabs did you update huggingface_hub to latest version? :confused:

Fixed on GH code updated here thanks @Wauplin

freddyaboulton changed pull request status to closed

Sign up or log in to comment