Spaces:

fastrtc
/

talk-to-sambanova-gradio

Running

App Files Files Community

Use `huggingface_hub.InferenceClient` instead of `openai` to call Sambanova

by Wauplin HF staff - opened 1 day ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-5

Wauplin

1 day ago

This is a suggestion to use the huggingface_hub client instead of openai's one to call Sambanova API. No need to provide the sambanova API endpoint anymore. Also, one can use the HF model id and easily switch between providers (currently 7 providers on https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct, user can select based on speed/cost trade-off).

More advanced suggestion would be to set a HF_TOKEN Space secret instead and instantiate the client like this:

client = huggingface_hub.InferenceClient(
    provider="sambanova",
)

This will provide HF routing => easier to switch between providers while keeping billing in a single place.

Use `huggingface_hub.InferenceClient` instead of `openai` to call Sambanovaf6c0ef91

Wauplin

1 day ago

(same comment on https://huggingface.co/spaces/fastrtc/talk-to-sambanova, do you want me to open a PR @freddyaboulton ?)

Update requirements.txtab224c0e

abidlabs

about 24 hours ago

•

edited about 24 hours ago

I tested this locally and I'm getting this error:

fastrtc.utils.WebRTCError: Model meta-llama/Llama-3.2-3B-Instruct is not supported with Sambanova for task conversational.

Full code:

from fastrtc import (ReplyOnPause, Stream, get_stt_model, get_tts_model)
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="sambanova",
)

stt_model = get_stt_model()
tts_model = get_tts_model()

def echo(audio):
    prompt = stt_model.stt(audio)
    response = client.chat.completions.create(
        model="meta-llama/Llama-3.2-3B-Instruct",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=400,
    )
    prompt = response.choices[0].message.content
    for audio_chunk in tts_model.stream_tts_sync(prompt):
        yield audio_chunk

stream = Stream(
    ReplyOnPause(echo), 
    modality="audio", 
    mode="send-receive", 
    ui_args={
        "icon": "https://upload.wikimedia.org/wikipedia/commons/thumb/0/01/Portrait-of-a-woman.jpg/960px-Portrait-of-a-woman.jpg?20200608215745",
        "pulse_color": "rgb(35, 157, 225)",
    },
)

stream.ui.launch()

freddyaboulton

FastRTC org about 23 hours ago

I originally saw a bit more latency with going through HF first which is why I originally did not use Inference Client. I will test again and if it's negligible overhead I will switch over!

freddyaboulton

FastRTC org about 23 hours ago

Also please move the PR over to the repo (https://github.com/freddyaboulton/fastrtc/tree/main/demo/talk_to_sambanova) everything gets synched here from there

Wauplin

about 10 hours ago

@abidlabs did you update huggingface_hub to latest version? :confused:

Wauplin

about 9 hours ago

Opened PR on GH: https://github.com/freddyaboulton/fastrtc/pull/79

freddyaboulton

FastRTC org about 4 hours ago

Fixed on GH code updated here thanks @Wauplin

freddyaboulton changed pull request status to closed about 4 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment