Dynamic batching

Dynamic batching is a process of adjusting the batch size run during inference to match the incoming traffic. During times of high traffic, the model runs at large batches to maximize GPU utilization, and during times of low traffic, a lower batch size is used to minimize time spent waiting for additional requests.

