huggingface/HuggingDiscussions · [FEEDBACK] Inference Providers

julien-c

Hugging Face org Jan 17

Any inference provider you love, and that you'd like to be able to access directly from the Hub?

reach-vb

Hugging Face org 25 days ago

•

edited 25 days ago

Love that I can call DeepSeek R1 directly from the Hub 🔥

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="together",
    api_key="xxxxxxxxxxxxxxxxxxxxxxxx"
)

messages = [
    {
        "role": "user",
        "content": "What is the capital of France?"
    }
]

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1", 
    messages=messages, 
    max_tokens=500
)

print(completion.choices[0].message)

benhaotang

24 days ago

•

edited 24 days ago

Is it possible to set a monthly payment budget or rate limits for all the external providers? I don't see such options in billings tab. In case a key is or session token is stolen, it can be quite dangerous to my thin wallet:(

julien-c

Hugging Face org 24 days ago

@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future

benhaotang

24 days ago

•

edited 24 days ago

@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future

Thanks for your quick reply, good to know!

sylanaustin

24 days ago

Would be great if you could add Nebius AI Studio to the list :) New inference provider on the market, with the absolute cheapest prices and the highest rate limits...

Hazzzardous

24 days ago

Could be good to add featherless.ai

teentitan

24 days ago

TitanML !!

71 hidden messages

Expand all

Steve2822

10 days ago

"ai/ml api - dudes have a lot of models from HF!"

therealtausif

9 days ago

We just launched our inference product here at io.net called IO Intelligence! Would love to have this plugged into HF as a new inference provider. Currently free to use until 500k token/day limit but we'll have the lowest price per token for the models we host once we roll out full pricing.

Feel free to try it out here: https://ai.io.net/ai/models

unparadox

5 days ago

I'm really confused on this, it seems like things got a lot worse? Am I understanding that correctly? I'm not really sure how much an individual call would cost or how it can be calculated for a particular endpoint. Maybe I'm just confused but it seems like this announcement is essentially just saying pro subs had a pretty major feature removed. I've never really even used the inference features, but this reads like they are essentially gone now, right?

MarwanMashra

5 days ago

How can an Enterprise pay for Inference Credits ?
I'm a member of an Org with Enterprise Hub subscription, and I set up a payment method in the billing section of my org account. It seems that an access token can't be created by an Org, so I created an access token using my account, and it does show up in the access tokens of my Org. However, when using Inference providers, it seems to be only consuming my Inference Credits, and not trying to use the payment method of the Org. Is there a way to use Inference providers will billing the Org ? Thanks.

hvaara

2 days ago

The Python docs for Inference Providers on hub seems to have a small error. I was going to propose a PR, but I couldn't find the source - is this available somewhere?

Steps to reproduce:

Go to a model page eg. https://huggingface.co/bigcode/starcoder2-15b
Click Deploy
Click Inference Providers
Click Python (deep link: https://huggingface.co/bigcode/starcoder2-15b?inference_provider=hf-inference&inference_api=true&language=python)

You'll now see

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="hf-inference",
    api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx"
)

result = client.text_generation(
    model="bigcode/starcoder2-15b",
    inputs="Can you please let us know more details about your ",
    provider="hf-inference",
)

print(result)

The problem is that the provider parameter should not be passed in via client.text_generation. It's correctly passed above when the client is instantiated via InferenceClient, but should be removed from client.text_generation.

flaviuR

2 days ago

@julien-c really you should include https://runware.ai as an inference provider.

It is by far the fastest and the cheapest provider with the most comprehensive API for making visual content: just have a look at this awesome documentation: https://runware.ai/docs/en/image-inference/api-reference

A FLUX DEV image is generated in 1.9 seconds with a price of 0.0038 (yes, 3 zeros, almost 10x cheaper than anybody else) , there are 300k+ models on the platform

and they are backed by A16Z, Lakestar and Midjourney

charlie959

1 day ago

RunPod! The most flexible option out there.

PSM24

1 day ago

Can you add DeepInfra?