[FEEDBACK] Inference Providers
Any inference provider you love, and that you'd like to be able to access directly from the Hub?
Love that I can call DeepSeek R1 directly from the Hub ๐ฅ
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="together",
api_key="xxxxxxxxxxxxxxxxxxxxxxxx"
)
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1",
messages=messages,
max_tokens=500
)
print(completion.choices[0].message)
Is it possible to set a monthly payment budget or rate limits for all the external providers? I don't see such options in billings tab. In case a key is or session token is stolen, it can be quite dangerous to my thin wallet:(
@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future
@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future
Thanks for your quick reply, good to know!
Would be great if you could add Nebius AI Studio to the list :) New inference provider on the market, with the absolute cheapest prices and the highest rate limits...
Could be good to add featherless.ai
TitanML !!
"ai/ml api - dudes have a lot of models from HF!"
We just launched our inference product here at io.net called IO Intelligence! Would love to have this plugged into HF as a new inference provider. Currently free to use until 500k token/day limit but we'll have the lowest price per token for the models we host once we roll out full pricing.
Feel free to try it out here: https://ai.io.net/ai/models
I'm really confused on this, it seems like things got a lot worse? Am I understanding that correctly? I'm not really sure how much an individual call would cost or how it can be calculated for a particular endpoint. Maybe I'm just confused but it seems like this announcement is essentially just saying pro subs had a pretty major feature removed. I've never really even used the inference features, but this reads like they are essentially gone now, right?
How can an Enterprise pay for Inference Credits ?
I'm a member of an Org with Enterprise Hub subscription, and I set up a payment method in the billing section of my org account. It seems that an access token can't be created by an Org, so I created an access token using my account, and it does show up in the access tokens of my Org. However, when using Inference providers, it seems to be only consuming my Inference Credits, and not trying to use the payment method of the Org. Is there a way to use Inference providers will billing the Org ? Thanks.
The Python docs for Inference Providers on hub seems to have a small error. I was going to propose a PR, but I couldn't find the source - is this available somewhere?
Steps to reproduce:
- Go to a model page eg. https://huggingface.co/bigcode/starcoder2-15b
- Click
Deploy
- Click
Inference Providers
- Click
Python
(deep link: https://huggingface.co/bigcode/starcoder2-15b?inference_provider=hf-inference&inference_api=true&language=python)
You'll now see
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx"
)
result = client.text_generation(
model="bigcode/starcoder2-15b",
inputs="Can you please let us know more details about your ",
provider="hf-inference",
)
print(result)
The problem is that the provider
parameter should not be passed in via client.text_generation
. It's correctly passed above when the client is instantiated via InferenceClient
, but should be removed from client.text_generation
.
@julien-c really you should include https://runware.ai as an inference provider.
It is by far the fastest and the cheapest provider with the most comprehensive API for making visual content: just have a look at this awesome documentation: https://runware.ai/docs/en/image-inference/api-reference
A FLUX DEV image is generated in 1.9 seconds with a price of 0.0038 (yes, 3 zeros, almost 10x cheaper than anybody else) , there are 300k+ models on the platform
and they are backed by A16Z, Lakestar and Midjourney
RunPod! The most flexible option out there.
Can you add DeepInfra?