@singhsidhukuldeep on Hugging Face: "Just tried LitServe from the good folks at @LightningAI! Between llama.cpp…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

singhsidhukuldeep

posted an update Aug 31, 2024

Post

892

Just tried LitServe from the good folks at @LightningAI !

Between llama.cpp and vLLM, there is a small gap where a few large models are not deployable!

That's where LitServe comes in!

LitServe is a high-throughput serving engine for AI models built on FastAPI.

Yes, built on FastAPI. That's where the advantage and the issue lie.

It's extremely flexible and supports multi-modality and a variety of models out of the box.

But in my testing, it lags far behind in speed compared to vLLM.

Also, no OpenAI API-compatible endpoint is available as of now.

But as we move to multi-modal models and agents, this serves as a good starting point. However, it’s got to become faster...

GitHub: https://github.com/Lightning-AI/LitServe

aniketmaurya

4 days ago

woohoo thanks for checking LitServe @singhsidhukuldeep ! LitServe now has OpenAI API-compatible endpoint and you can also serve a LLM using vLLM engine with LitServe so you get both speed + flexibility.

In this post