Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
singhsidhukuldeep 
posted an update Aug 31, 2024
Post
892
Just tried LitServe from the good folks at @LightningAI !

Between llama.cpp and vLLM, there is a small gap where a few large models are not deployable!

That's where LitServe comes in!

LitServe is a high-throughput serving engine for AI models built on FastAPI.

Yes, built on FastAPI. That's where the advantage and the issue lie.

It's extremely flexible and supports multi-modality and a variety of models out of the box.

But in my testing, it lags far behind in speed compared to vLLM.

Also, no OpenAI API-compatible endpoint is available as of now.

But as we move to multi-modal models and agents, this serves as a good starting point. However, it’s got to become faster...

GitHub: https://github.com/Lightning-AI/LitServe

woohoo thanks for checking LitServe @singhsidhukuldeep ! LitServe now has OpenAI API-compatible endpoint and you can also serve a LLM using vLLM engine with LitServe so you get both speed + flexibility.