Continuous batching

Continuous batching is an algorithm which increases the throughput of large language model (LLM) serving. It allows the size of the batch on which a machine learning model is working on to grow and shrink dynamically over time. This means responses are served to users more quickly at high load (significantly higher throughput).