Inference Server

Inference servers are the “workhorse” of AI applications, they are the bridge between the trained AI model and real-world, useful applications. Inference servers are specialised software that efficiently manages and executes these crucial inference tasks. 

The inference server handles requests to process data, running the model, and returning results. An inference server is deployed on a single ‘node’ (GPU, or group of GPUs), it is scaled across nodes for elastic scale through integrations with orchestration tools like Kubernetes. Without an inference server, the model weights and architecture are useless, it is the inference server that gives us the ability to interact with the model and build it into our application.

Related Articles

No items found.