Model serving

Model serving is the process of taking a machine learning model and putting it into a server. A server is a continuously running listener process which waits for requests from end-users, processes them, and then sends responses. This should be distinguished from, for example, batch processing - where a process has a list of data that it churns through on some regular schedule. This paradigm is the foundation of the modern web, and is the main way in which machine learning models are put into production today.

The Titan Takeoff Inference Server is a fast way to run machine learning model inference in a web server.