The fastest and easiest way to inference LLMs - Titan Takeoff Server 🛫
takeoff 🛫

Locally inference your LLMs with lightning fast speed

get started
Schedule AI consultation
## inference with lightning fast inference locally

pip install titan-iris
iris takeoff --model tiiuae/falcon-7b-instruct --device cpu --port 8000
iris takeoff --infer --port 8000
Faster inference
State-of-the-art
inference optimization

Takeoff utilizes state-of-the-art hardware-aware runtime acceleration techniques as standard, making your models significantly faster, even on CPU deployments! Achieve 10x speed-ups and 90% cost reductions within minutes.

Low-cost GPUs
On-prem hardware
Low-cost cloud CPUs
Super easy to use
Deployment infrastructure and Ops sorted

Maintain complete privacy by deploying onto your cloud or hardware of choice easily

Abstract away the difficulties of deploying and serving large language models

Use fewer resources and cheaper hardware thanks to optimised deployments

Privacy
Maintain total privacy by deploying your models privately

Takeoff is the best and easiest way to deploy self-hosted LLMs - achieve low latency on the hardware that is available to you without any data or IP leaving your control.

Pricing

Takeoff Server 🛫

Easily deploy optimized models to a high performance server on your hardware or infrastructure of choice

Community

For the community and individual users playing with self-hosting LLMs.

get started
Join Beta
  • Inference optimization for single GPU deployment
  • Rapid experimentation with chat and playground UI
  • Generation parameter control
  • Int8 quantization
  • Langchain integration
  • Token streaming

Pro

For ML teams looking to deploy LLMs for testing or production.

get in touch
START Enterprice
  • Everything in the Community
  • Batching support
  • Controlled Regex outputs
  • Logging and monitoring support
  • Multi-GPU deployment
  • Multi-model deployment
  • Int4 quantization
  • GPU & CPU deployment
  • Optimized Rust server for enhanced throughput
  • Enhanced Integrations
  • Dedicated support
FAQ

FAQs

01
Will TitanML work with my current tools and CI/CD?

Yes - TitanML is integrated with many major model hubs including Hugging Face, Langchain, and Determined AI as well as logging and monitoring tools. If there are any particular integrations you would like to see, do let us know!

02
Which tasks and models does the Takeoff Server support?

The Takeoff server supports all major language models and continuously updates support as new models are released.

03
Why is the Takeoff Server better than alternatives?

Serving - Takeoff is the easiest and best performing serving framework:

1. Triton Server - Working with the Triton server is incredibly difficult (those who know, know!) The Takeoff Server is considerably simpler and easier.
2. Torchserve - Unlike Torchserve, you don't need to know any ML to deploy these models optimally - all of the ML is abstracted away. You don't even need to know Python to work with the Takeoff Server! Also the Takeoff Server has inference optimisation methods that make your models much faster and cheaper to deploy!

04
Where can I deploy my Takeoff Server?

TitanML models can be deployed on your hardware of choice and the optimizations applied to the models will be optimal for that hardware. This includes Intel CPUs, NVIDIA GPUs, and AWS Inferentia chips.

05
How much is the Takeoff Server?

The community version is free. The pro version of the Takeoff Server is charged depending on the model parameter size - but even with the licence fee, this results in an 80% cost saving! Reach out to us for a quote for your use case.