Easily deploy optimized models to a high performance server on your hardware or infrastructure of choice
Yes - TitanML is integrated with many major model hubs including Hugging Face, Langchain, and Determined AI as well as logging and monitoring tools. If there are any particular integrations you would like to see, do let us know!
The Takeoff server supports all major language models and continuously updates support as new models are released.
Serving - Takeoff is the easiest and best performing serving framework:
1. Triton Server - Working with the Triton server is incredibly difficult (those who know, know!) The Takeoff Server is considerably simpler and easier.
2. Torchserve - Unlike Torchserve, you don't need to know any ML to deploy these models optimally - all of the ML is abstracted away. You don't even need to know Python to work with the Takeoff Server! Also the Takeoff Server has inference optimisation methods that make your models much faster and cheaper to deploy!
TitanML models can be deployed on your hardware of choice and the optimizations applied to the models will be optimal for that hardware. This includes Intel CPUs, NVIDIA GPUs, and AWS Inferentia chips.
The community version is free. The pro version of the Takeoff Server is charged depending on the model parameter size - but even with the licence fee, this results in an 80% cost saving! Reach out to us for a quote for your use case.